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Summary of Research 


The research reported here is a part of NASA’s Synthetic Vision System (SVS) project for the 
development of a High Speed Civil Transport Aircraft (HSCT). One of the components of the 
SVS is a module for detection of potential obstacles in the aircraft’s flight path by analyzing the 
images captured by an on-board camera in real-time. Design of such a module includes the 
selection and characterization of robust, reliable, and fast techniques and their implementation 
for execution in real-time. This report describes the results of our research in realizing such a 
design. It is organized into three parts as described below. 

Part I. Data modeling and camera characterization: A critical component of the vision 
system is the imaging camera. Understanding the imaging characteristics of the camera as well 
as its limitations based on an accurate model is the first step in the design of the complete 
system. In this part of the report we describe a systematic procedure and an experimental 
protocol to measure the spatial and temporal noise in a digital camera. Specifically, we describe 
the model and the measured characteristics of a Kodak Megaplus ES 1.0 digital camera 
including its dark-field response, photo response nonuniformities, charge transfer efficiency, and 
inter-pixel and other noises. 

Part n. Algorithms for detecting airborne obstacles: Methods used for detecting airborne 
obstacles using image sequences obtained from a camera mounted on a test aircraft are described 
in this part. The performance of detection algorithms is characterized in the presence of camera 
noise using theoretical and experimental methods. The problem of hazard detection in the 
presence of background clutter in the image either due to clouds or the landscape below the 
horizon is addressed. Algorithm fusion to overcome the limitation of individual algorithms is 
studied. The image processing and tracking algorithms are described in this part and their 
implementation details are presented in Part III. 

Part HI. Real-time implementation of obstacle detection algorithms on the Datacube 
MaxPCI architecture: We describe the computational requirements and time-complexities of 
the target detection algorithms and their implementation in a parallel/pipe-line architecture. In 
particular, we describe results of our implementation of these algorithms on the Datacube 
MaxPCI architecture. We describe the results of the flight tests conducted to evaluate the real- 
time performance of the system. 

A list of publications resulting from this grant as well as a list of relevant publications resulting 
from prior NASA grants on this topic are presented in the following pages. This research did 
not result in any inventions. 
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Part I 


Data Modeling and Camera Characterization 

Abstract 

In this part, the procedure we used to model the noise characteristics of a digital 
CCD camera is described in detail. The functioning of a CCD is described, along with 
the various sources of noise present in the camera system. A systematic procedure is 
developed to measure the spatial and temporal noise of the camera, and the results are 
shown in detail. Finally, the measurement of spatial frequency response of the camera 
system and the validation of various noise models are proposed as future work. 
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Chapter 1 

Introduction and Motivation 


This part of the report describes the progress of research at Penn State’s Computer Vision 
Lab to develop a simple but accurate method for characterizing and removing the noise 
introduced by a digital CCD camera. Digital CCD cameras offer superior performance as 
compared to their analog counterparts. For example, digital cameras are free of the spatial 
inconsistencies between rows and between frames (i.e., jitter) that may be caused by video 
clock instabilities. By its nature, a digital imaging system is also highly immune to the 
spatial and temporal artifacts that may be introduced by transmission-line noise. As noted 
in Section 3 below, however, several noise processes may still be encountered in such a system. 
The goal of the modeling and characterization of the camera described here is to enhance 
the operation of a system for airborne obstacle detection. As an example, consider a Cessna 
aircraft that has a length and wing-span of approximately 9 m (30’) and the fuselage diameter 
of approximately 1.2 m (4’) [5]. The detection algorithm must be capable of detecting this 
small target at least 25 seconds prior to a possible collision to allow for corrective actions by 
the pilot. Assuming that both aircrafts are traveling at 125 m/s (250 knots), their relative 
velocity can be as high as 250 m/s (500 knots). In such case, they would be 6.25 km 
(3.5 nautical miles) apart 25 seconds before collision. Using a camera with a resolution of 
60 pixels per degree, the image of the aircraft is of 5.0 x 0.7 pixels from a side view, but only 
0.7 x 0.7 pixels from a front view. It is clear that safety demands that noise effects even at 
the sub-pixel level must be accounted for and compensated as much as possible. 
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Chapter 2 

CCD Operation and Noise Sources 


This chapter describes the theory of operation of a CCD camera, followed by the descript ion 
of various noise sources affecting a CCD camera. 

2.1 Basics of CCD theory of operation 

In this section, the theory of CCD operation is presented, and terms are defined. Variants of 
CCD architecture are compared, and refinements such as blooming suppression are explained 
[2], Discussion will lead to specific features of Kodak ES 1.0 camera. 

2.1.1 Theory of CCD operation 

The charge-coupled device (CCD) first appeared in a 1970 Bell Labs technical report. Its 
usefulness in both analog and digital electronics was recognized at once, and CCDs have 
been used, for example, in signal processing applications such as delay lines for both analog 
and digital signals. Since about 1980, however, the term CCD has become synonymous 
with video imaging for both the mass-produced consumer and the top-performance scientific 
markets. Although CMOS imaging devices, offering a one-chip solution to image capture 
and processing, are about to enter the consumer market, it seems certain that CCDs will 
continue to dominate the high-performance imaging market for some time to come. 

To start with, it should be noted that the CCD is an analog device, and not a digital 
one. It is true that the operative quantities in the CCD are charges, and that because these 
charges occur in quantizable form as electrons, there is a discrete character to the device’s 
operation. Any semiconductive device, however, operates by the transfer of charge carriers. 
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Although at the lowest level all such operation is discretizable, it is only when we associate 
an information content to a transition between discrete levels of much greater magnitude 
- a transition that is largely unaffected by the noise processes inherent in all such physical 
systems - that we call such a device ‘digital.’ 

In its simplest form, a CCD comprises an array of charge storage sites, wherein each 
storage site is an MOS capacitor as shown in Figure 2.1 (a). An MOS (for metal-oxide- 
semiconductor) capacitor comprises an insulating layer of silicon oxide sandwiched between 
a metallic (e.g. aluminum) gate and a silicon substrate, which has been doped into semi- 
conductivity with an excess of p-type carriers (holes). Typically, the gate is made of degen- 
erativelv doped polycrystalline silicon (or polvsilicon) instead of a metal. Output leads are 
bonded via ohmic contacts to the gate and the substrate. 

When a potential is applied between the gate and substrate, a region develops in the 
substrate underlying the gate that is swept free of p-type carriers by electrostatic repulsion 
(Figure 2.1 (b)). Any electrons that may appear in this region (e.g., via injection or gener- 
ation) will be attracted to the gate and thus will congregate below the oxide layer. Because 
the p-type carriers have been repelled from this region, the electrons are protected against 
recombination, and the quantity of charge which they represent in the aggregate called a 
charge packet will be preserved indefinitely. This region is called the depletion region, and 
the electrostatic barrier that defines it is called a potential well. 

Each storage site can hold only a finite number of electrons before it begins to overflow. 
This number is called the ‘full- well’ capacity and generally (for CCD imagers) ranges from 
about 20,000 to about 100,000. 

Once the charge packet has been formed, it remains to pass the packet along the array 
from one storage site to the next without altering its contents. The information represented 
by the value of the charge packet cannot be known until that value can be outputted from 
the chip. The basic and most common mechanism for the transfer of charge packets is the 
three-phase clocked approach shown in Figure 2.2. The first phase is the application of a 
potential to the A sites, creating potential wells. The second phase is to apply a potential to 
the B sites as well, thereby spreading the charge packet between the A and B sites. The third 
phase is to remove the potential from the A sites, which completes the process of moving 
the charge packets from the A sites to the B sites. In the next three phases, the packets are 
moved from the B sites to the C sites, and so on. From this description, one may understand 
why early CCD delay lines were referred to as ‘bucket brigade devices.’ 

Although one of every three sites is not used during each cycle, one may see that this 
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Figure 2.1: (a) A Metal Oxide Semiconductor (MOS) capacitor, (b) Depletion region in the 
MOS capacitor, when a potential is applied between the gate and the substrate. 



Figure 2.2: Charge transfer in a three-phase device. This represents one column. Rows go 
into the paper. Six steps are required to move the charge one pixel [2], 
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empty site provides the necessary function of separating each charge packet from the next 
one. Likewise, one may see that when a three-phase clocking method is used, the available 
site size (and therefore the maximum site capacity) is reduced to one-third. For video 
applications, for example, each pixel must be broken into three areas, only one of which may 
be optically active, so sensitivity is necessarily reduced. 

Because of the capacity and sensitivity constraints of the three-phase system, other ap- 
proaches have also been developed. By changing only the clocking method, for example, 
a four-phase approach may be used which allows an adjacent two of every four sites to be 
active at a time, thereby increasing pixel capacity to one-half. One may obtain two-phase 
operation by modulating the thickness of the oxide layer, and chip designs that permit one- 
phase clocking also exist. It is believed that our Kodak MEGAPLUS ES 1.0 camera uses a 
three-phase clock, but the engineers we spoke to could not definitively confirm this feature. 

In a two-dimensional array, the packets in each column are transferred as described above, 
each column acting independently but in synchronization with all of the others. Each column 
empties into a shift-register row which operates in the same fashion as the columns but on 
a different timing scheme. That is, every time the column charge packets travel one site 
down the array, the column transfer operation must pause while the entire shift register is 
transferred sequentially through the terminal site. This terminal site, the last site on the 
array, is a diode which converts each incoming charge packet into a potential (i.e., a voltage). 
The stream of varying potentials may be amplified before being outputted from the chip as 
a raster image signal for further processing (including digitization). 

2.1.2 CCD imaging architectures 

In an imaging CCD, the charge packets are created by the photoexcitation of bound electrons 
into a free state by incident photons, and the subsequent migration of these free electrons 
into the depletion region. (As an aside, we note that the depletion region is usually so 
shallow that few free electrons are actually generated within it.) So long as light is incident 
on the array, this process will continue. One may easily realize that, the continuation of this 
process after image capture and during the transfer of the charge packets would cause image 
degradation. 

One solution to this problem is the use of a mechanical shutter synchronized to the 
capture/transfer timing. A better solution, called ‘electronic shuttering,’ uses different areas 
of the chip for capture and transfer. The transfer gates are covered with an opaque mask 
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Figure 2.3: Basic architecture for a full frame CCD [2]. 


so that the packets being transferred will not be corrupted. As a consequence, the array 
may be illuminated constantly without affecting the transfer process, thus increasing camera 
sensitivity. On the other hand, it should be noted that division of the chip into two areas 
necessarily decreases the area used for image capture, thereby decreasing camera sensitivity. 

At least two main divisional configurations exist. In a frame transfer configuration, the 
active and shielded areas of the chip are completely separated, as shown in Figure 2.3. At 
the end of the capture period, the packets from all of the active sites are simultaneously 
dumped into corresponding sites in the shielded array, and the transfer process begins. Note 
that a new capture period may begin at the same time the transfer process begins. Because 
of the physical concentration of the active sites and the high area sensitivity that results, 
high-performance cameras for scientific applications usually contain frame transfer CCDs. 

In the interline transfer configuration, lines of storage sites for image transfer are fab- 
ricated next to each line of active sites, as shown in Figure 2.4. Photodiodes rather than 
MOS capacitors are most often used for the active sites in such arrays (as is the case in our 
Kodak camera). The main disadvantage of this configuration is that as little as 20 % of the 
chip area may be available for image generation, resulting in a severe loss of sensitivity. For 
this reason, a microlens array is usually positioned adjacent to the chip surface to increase 
the ‘fill factor.’ The microlens array contains one lens for each pixel, which focuses the light 
incident on the entire pixel onto the area of the active site. 
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Figure 2.4: Interline transfer architecture. The charge is rapidly transferred to interline 
transfer registers via the transfer gate. The registers may have three or four gates [2]. 


2.1.3 Other features of CCD camera chips 

If the depletion region of an active site is filled beyond capacity, the excess charge will 
spill over and contaminate adjacent sites in a process called ‘blooming.’ Spill-over between 
adjacent columns is prevented during fabrication, but a particularly strong local illumination 
may saturate most of the length of the affected columns. Blooming suppression combats this 
effect with anti-blooming drains that suddenly flatten the response of the active site above 
a certain intensity. 

In some cameras, such as our Kodak camera, the bias point of the anti-blooming drains 
is variable. While reducing the bias point necessarily reduces the array sensitivity, in some 
applications such a tradeoff may be acceptable. In cameras used to monitor automobile 
traffic, for example, increased resistance to blooming caused by headlights may be worth 
some loss of sensitivity. 

Our Kodak camera also has a dual-channel transfer configuration. In this structure, the 
even and odd rows of the array are processed and outputted through two separate channels. 
This configuration allows faster data throughput: our camera can supply 30 frames of 1000 x 
1000 pixels per second, while a single-channel version of the same camera can only supply 
15 frames per second. With respect to noise analysis, however, it is important to recognize 
the fact that pixels in adjacent rows may not be subject to the same noise processes at any 
given time. This disparity is especially important when considering frequency-dependent, 




processes, such as the Modulation Transfer Function (MTF) as discussed in Section 8 below. 


2.2 Noise sources in a digital CCD camera 

In this section, various types of noise sources in a CCD camera are discussed [2], Dark and 
photoresponse noises are distinguished. CCD-generated noise (e.g. fixed pattern noise) and 
noise generated by support electronics (e.g. readout noise) are distinguished. Other noises 
such as interpixel effects (e.g. blooming and smear) and optical effects such as point-spread 
functions are also discussed. 

2.2.1 Overview of relevant noise processes 

Although in-camera digitization offers good protection against transmission-line noise, the 
signal outputted by the camera is only a flawed representation of the image which is incident 
on the CCD array. For one thing, the photosites are like snowflakes in that none is exactly 
like any one of the others, and each site will respond somewhat differently to the same level 
of luminous excitation. 

Likewise, in any such device that is operating above absolute zero, electrons are generated 
thermally as well as optically. Once generated, each electron is indistinguishable from any 
other, so some portion of the charge packet is necessarily always invalid. Moreover, each site 
responds differently to this noise process as well. 

Finally, the on-chip paths by which the charge packets are read out from the chip and 
converted into potential values, and the off-chip circuitry through which these signals are 
amplified and digitized, introduce errors of their own that may vary with signal amplitude 
and frequency. An illustration of the imaging system path and some of the noise processes 
associated with each step is presented in Figure 2.5. 

2.2.2 Dark-field response and nonuniformities 

As mentioned above, the electrons that migrate into the depletion region may be generated 
by thermal as well as photoelectric processes. Therefore, some signal will be outputted even 
when the array is in total darkness. The result of this phenomenon is called the camera’s 
dark-field response. 

Dark-field response will vary from pixel to pixel. This noise process is also extremely 
temperature-dependent: for example, the noise level doubles when the array temperature 
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Figure 2.5: An illustration of the imaging system path and some of the noise processes 
associated with each step 


increases by 8 to 9 degrees Celcius. For this reason, measurements should be taken only after 
the camera has warmed up. (In order to obtain an accurate measurement of the dark field, 
we believe that it is also important to allow the array to warm up under normal operating 
illumination conditions. A focused image of any intensity will certainly affect the surface' 
temperature of the array, and thereby influence the dark component of the total response.) 

The magnitude of the dark-field response is also, of course, linearly dependent on the 
exposure time, i.e., the period of time during which electrons are being collected. Although 
thermal electrons may also potentially corrupt the charge packets during the transfer process, 
transfer across the chip occurs so rapidly that this quantity is usually ignored. 

Fortunately, the thermal noise process is simply additive. So long as we can reliably 
estimate the number of such electrons collected at a particular site, it is a trivial matter to 
subtract that measure from the gross response. 

CCD chips are usually (if not universally) fabricated so that, some of the active sites on 
the periphery are shielded from illumination. (An area of isolation pixels also separates these 
dark pixels from the active ones in order to prevent light leakage.) During image processing, 
the values returned by the dark pixels may be used to calculate an estimate of the magnitude 
of the array’s dark response, which may then be subtracted from the outputted image. As 
the response of each active site is unique, however, the accuracy of this approach to dark-field 
compensation is not optimal. 
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Our Kodak camera includes a feature called ‘dark-clamping’ whereby such an estimate 
of the dark field is automatically subtracted from the image. While not exactly accurate 
and therefore not entirely appropriate for our present purposes, this feature is considered a 
significant advantage for consumer applications. In ‘dark-clamping,’ the dark-field estimate 
is automatically calculated and subtracted, so that the camera’s output has already been 
compensated. As the process is transparent and occurs directly at the chip output, it is not 
necessary to keep track of exposure time or temperature. It may be possible to disable this 
function in our camera via a software command, but the particular Kodak engineers with 
such knowledge have so far been unresponsive to our requests. 

2.2.3 Photoresponse nonuniformity 

Just as the active sites vary in their response to thermal excitation, they also vary in their 
photoresponse. In other words, each pixel will react differently to the same level of incident 
illumination. The degree by which the number of photoelectrons collected by a particular 
site varies from an arbitrary standard amount may be thought of as a local ‘gain factor,’ as 
this noise process is multiplicative with respect to the level of excitation and the response of 
each site is generally quite linear. 

The dark-field and photoresponse nonuniformities together comprise the array’s ‘fixed- 
pattern noise.’ Generally, fixed-pattern noise is defined only at each pixel and has no spatially 
varying component. In other words, there will be no correlation between the fixed-pattern 
noise at two adjacent pixels. However, we note that there will usually (if not always) be a low- 
frequency component to the photoresponse nonuniformity, caused by unavoidable variations 
in the substrate thickness. These variations cause photons of the same wavelength to interact 
differently at the quantum level at different points on the array. This effect is not a separate 
factor, though, and is incorporated into the general photoresponse nonuniformity. 

2.2.4 Charge-transfer efficiency 

Although each transfer gate successfully moves well over 99.99 % of each charge packet to the 
next gate in the column, some small amount of charge stays behind. When the incident image 
contains sharp (i.e., high-frequency) transitions between areas varying greatly in amplitude, 
this process will tend to filter out the high spatial frequencies by blurring the transitions. 
This effect increases with array size, i.e., with the number of transfers required to move each 
packet off of the array. Charge-transfer efficiency is a component of the array’s frequency 
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response and therefore is included in the array’s modulation transfer function (Section 8). 


2.2.5 Other interpixel noise mechanisms 

As discussed in Section 2 above, blooming occurs when the charge in a saturated pixel 
overflows into adjacent pixels in the column. By the anti-blooming measures described above, 
the post-saturation response can be largely reduced, but it cannot be eliminated. The extent 
to which a charge packet has been corrupted by overflow is of course indeterminable, and in 
processing the resulting images this effect must be kept in mind. Whenever a saturated pixel 
is encountered, the signal outputted from its column neighbors must be considered suspect 
and possibly corrupted. 

Another source of interpixel noise, called ‘smear’ (or sometimes tunneling), occurs when 
photoelectrons generated at one site migrate instead into a neighboring site. (This process 
is quite different from the process, also called ‘smear,’ which occurs when transfer occurs in 
a non-shielded array while the array is still being illuminated.) As this process is related 
to signal frequency, we would expect it to be included in the array’s modulation transfer 
function (Section 8). 

2.2.6 Reset noise 

Reset noise arises when the capacitor which converts the charge packets into potential values 
is not completely reset between packets. As this noise is highly temperature-dependent, it is 
also referred to as kTC noise, where k is the Boltzmann constant, T is the temperature and 
C is the capacitance of the device. In most if not all CCDs manufactured today, this noise 
process has been virtually eliminated through the use of correlated double sampling (CDS), 
whereby two samples are taken from each packet and averaged to remove the reset error. 

2.2.7 Readout noise 

As the signal generated by the CCD array is exceedingly small, it must be amplified before 
processing. Each of the amplification and processing stages necessarily introduces its own 
noise process, which will generally be dependent on temperature and signal frequency. On 
the whole, though, this noise is random in time and cannot be compensated. 
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2.2.8 Quantization noise 

Conversion of the analog array signal into a digital quantity necessarily results in a certain 
loss of information. While this noise is completely random and unknowable, it is easily 
characterized as a zero-mean process whose variance is a function of the number of bits in 
the digital output. 
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Chapter 3 

Estimation of CCD Noise 


This chapter describes the methods for estimating the parameters of the temporal as well as 
the fixed pattern noise of a CCD camera. Healv-Kondepudy noise estimation procedure is 
used to estimate the temporal noise. An experimental protocol is developed for estimation 
of fixed pattern noise, and the detailed mathematical analysis for least squares estimation 
of the noise is presented. Results of noise estimation using the Kodak ES 1.0 camera are 
described. 


3.1 Description of Healey-Kondepudy noise estimation 
procedure 

Following a common model of CCD behavior, Healey and Kondepudv [1] express the digital 
output D at each pixel as 

D — (hi + Edc + Ng + Nr ) A + Nq , 

where 

• K is a factor that characterizes the pixel’s photoresponse, 

• I is the incident illumination, 

• Edc is the expected number of dark electrons, 

• Ns is the shot noise, 

• Nr is the readout noise, 
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• A is the analog gain, and 

• Nq is the quantization error. 

To reduce this expression, Healey and Kondepudv make the following three assumptions: 

1. The photoresponse factor A is very close to 1 for all pixels, 

2. The expected number of dark electrons Eoc is nearly constant across the array, and 

3. The incident illumination / is nearly constant across the array. 

Using these assumptions, and representing the image-wide means of I and Eoc as 7 and 
E oc , the expression for D is reduced to the form 

D — fi + A. 


where 

/; = .4(7 + Eoc) 


and N is a zero-mean random variable characterized by 

o\- = - 4 2 (7 + Eoc) + Ef,. 

Here the noise term of, is assumed independent of the number of collected electrons: 


(T, 


C 


1 2 : ’i , 

4<Tk + IS' 


These relations imply 


ojf = An + (J 2 C , 

so by taking the differences between pairs of similar images (i.e. /q = // 2 ). Healey and 
Kondepudy estimate the parameters ,4 and erf, . 


3.2 Development of pattern noise (flat-field) experi- 
mental protocol 

In this section, a history of what we have observed is presented, including the drawbacks 
of our previous setups, and concluding with a detailed description of the final test bench. 
Possible sources of error are noted, for example, the failure to consider spectral content, and 
the use of neutral density filters in the optical path. 
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3.2.1 Fluorescent sources 

As a first step, we resolved to determine the flat-field response of our camera: i.e., the level 
of interpixel variation when each site in the entire array was presented with the same level of 
excitation. Because we had a relatively large computing capacity available, it seemed that 
we could capture and process large numbers of images fairly quickly, averaging the responses 
over time in order to eliminate temporal variations, and thereby develop a flat-field model 
that could easily be verified. Obtaining a. field illumination that was uniform in both time 
and space, however, turned out to be problematic, as those with more optical engineering 
experience might already know^. 

First, we concentrated on using reflected excitation. We reasoned that if transmitted 
light w 7 ere used, it would be impossible to completely remove the image of the light source 
from our field, even through diffusing sheets and a defocused lens. Therefore, reflection from 
a Lambertian surface appeared more promising in this regard. 

Although indoor fluorescent lighting is prevalent and apparently very uniform, it im- 
mediately became obvious that a fluorescent lighting source would not be suitable. The 
accuracy of our results, and specifically their immunity to temporal variations, would de- 
pend on our ability to collect a large number of images under identical excitation conditions. 
Conventional fluorescent lights, of course, flicker at approximately a 60 Hz rate, rendering the 
level of illumination across any sequence of shuttered images very non-uniform and therefore 
unusable for our purposes. 

High-frequency fluorescent sources are available, being priced at about $1,000 for a 10- 
inch square diffuse source operating at 15 kHz or higher. Even through a diffusing layer, 
however, such sources are not uniform enough to present a transmitted flat-field, and w'ould 
have to be used as target illuminators. We soon discovered that the problem of evenly 
illuminating a diffusing target w ? as not trivial, so this approach was not an optimal solution 
either. (For a target, we used an opaque sheet of coated matte paper that was supplied as 
a protective spacer for laser-printer color transparency blanks.) 

We did obtain good uniformity using the blank screen of a laptop computer as a transmit- 
ted flat-field. The internal configuration of such a device is unknown to us, but the operating 
frequency seemed to be high enough to provide temporal uniformity. It was impossible to 
vary the brightness of this field while maintaining the spatial uniformity, though. Although 
when white the pixels were uniform, their brightness varied from row to row 7 w r hen they were 
darkened. Also, the overall intensity of the source was insufficient to permit the use of neu- 
tral density filters to obtain different brightness levels. Although such a source is convenient 
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and generally available, the difficulty of evenly varying its brightness makes it unsuitable for 
the purposes of our model. 

3.2.2 Incandescent sources 

We found that an AC-powered incandescent source was also not entirely free from temporal 
flicker. While DC-powered incandescent sources are available, we also found it impossible 
to obtain spatially uniform illumination from such a source. We tried bulbs mounted in 
reflectors and an overhead projector, each shone through a diffusing sheet, but were unable 
to completely remove the filament image from the illumination. 

Consultation with a Kodak research scientist gave us some insights into general optical 
laboratory practices. We learned that DC-powered incandescent sources may be suitable for 
flat-field production, but only if a precision unit costing several thousand dollars is used. 
Also, such experiments should be conducted in a temperature-controlled room, and after the 
light source has been stabilized for at least. 24 hours. The light from such a source cannot 
be used to illuminate a flat surface, but rather must be ported into an integrating sphere, 
which is a hollow sphere with very small ports and a Lambertian inner coating. Besides 
being very expensive as well, such a sphere is of little use after the CCD sensor is mounted 
into the camera body. 

3.2.3 Solar source 

While we were investigating the suitability of other sources, we also tried to obt ain a flat-field 
from a diffuse surface posed near a window. Note that if properly monitored, the short-term 
temporal uniformity of the sun as a source can be excellent, as sunlight does not flicker. 
However, we found it generally impossible to reliably duplicate a uniform illumination of a 
flat diffuse surface. Every time we posed the target (on an artist’s easel, mounted to a flat, 
uniform, and non-reflective surface), the pattern of light distribution varied. 

As a location providing no less than 50 degrees of completely unobstructed open sky 
was available, we began to consider using transmitted suidight, as a flat field. Although the 
resulting setup could not be as completely specified as if a particular model of xenon lamp, 
for example, was used, the experiment could still be duplicated anywhere that an expanse of 
open sky was available. Also, we realized that the spectral content of the illumination could 
vary without our knowledge and affect the camera’s response. The freedom from flicker and 
the apparent uniformity of a patch of blue sky far from the sun led us to investigate this 
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Figure 3.1: The test setup used for capturing flat field images using the solar source. 


possibility. 

We began by thoroughly cleaning the window inside and out. Then, a double layer of 
matte acetate was mounted as a diffusing target to increase uniformity and also to reduce 
the illumination level somewhat. The camera lens was positioned about six inches away from 
the acetate, pointed at the center of the open expanse, and focused to infinity. This test 
setup is illustrated in Figure 3.1. 

We used a Nikon 58mm f/2.8 Micro-Nikkor lens for most of our experiments, as the 
performance of this lens was rated at the top in several surveys we found on the Web. The 
F-mount allowed us to always mount the lens in exactly the same rotational orientation, 
although we later found the lens to have excellent symmetry of response with respect to 
the optical axis. In order to obtain varying levels of sensor illumination, we varied the lens 
aperture (minimum aperture: f/32) and also used high-quality neutral density filters. 

W T e conducted our experiments under clear skies in April 1998 between 10:00 AM and 4:00 
PM. We found that after 4:00 PM, the level of illumination began to decrease perceptibly 
from minute to minute. Images were collected in sets of 100 at no less than 25 fps, so each 
set was collected in less than 4 seconds. Each run of sets, characterized as a number of sets 
taken at several different apertures, was collected as quickly as possible (generally within 10 
minutes) to provide a tentative basis for comparing the lens response at different apertures. 


17 




Uniformity of illumination level within each set of 100 images was verified by taking the 
average pixel value for each image, identifying the maximum among the 100 averages, and 
characterizing the other averages as percentages of the maximum. A set, and consequently 
the entire run, was rejected if a deviation of more than 1 % was discovered for average image 
values of 100 gray levels or more, on a scale of 0 to 255. For average values of under 100 gray 
levels, deviation of from 3 to 5 % were sometimes accepted, as we recognized that camera 
noise processes contributed a greater portion of the error in such images. 

Once the uniformity of each set within a run had been verified, each set w r as condensed 
into two 1000 x 1000 32-bit floating-point arrays. The first array was the pixel-by-pixel mean 
of the 100 images of the set, and the second array was the pixel-by-pixel variance. These 
two arrays became the input parameters for the model described in Section 6 below\ 


3.3 Development of pattern noise experimental analy- 
sis 

In this section, a detailed mathematical analysis of the linear system model is presented. 
Assumptions are identified and discussed. 

We begin by assuming that the behavior of each pixel at any particular moment in time 
can be described by an equation of the form 


y = mx + c + r] = E[y] + rj 


(3.1) 


where 


• y is the digital output signal, 

• x is the incident illumination (with xq defined as zero), 

• c is the constant portion of the dark field noise, or additive fixed pattern noise (FPN), 

• m is the constant portion of the interpixel photoresponse nonuniformity or multiplica- 
tive FPN, 

• rj is a noise term that, includes all other system noises such as shot noise, readout noise, 
system nonlinearities, and quantization error, 

• E[y] is the expected value of y. 
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The temporal noise q can be modelled as a zero mean Gaussian variable with a variance 

V[rj] = V[y] - w 0 + w x E{y\ (3.2) 

The term w 0 corresponds to the constant portion of the noise variance, caused mainly by 
the readout noise, whereas the w\ term corresponds to the shot noise, which is Poisson 
distributed with a variance proportional to the output mean. The resultant noise due to 
these terms can be approximated with a Gaussian distribution. 

If the FPN parameters m and c can be determined in advance, the FPN can be compen- 
sated prior to further processing. This is likely to improve the performance of the detection 
algorithm. Also, the temporal noise parameters w 0 and w\ would help us determine the 
performance of the algorithm. 

Let xo,X\,Xi...x n denote a number of intensity values at which observations are made, 
with x 0 = 0 denoting the zero intensity. For each intensity, we can write: 


Hi = mxi + c + Tj (3.3) 

Let the mean and variance of a y, be denoted by E t = F[y,] and V) = V[//,] = of. Then. 

1 i = «?o + W\Ei (3-4) 


An estimate of the mean and variance can be obtained by using the sample mean and variance 
of a number of images obtained under ident ical conditions. A set of such equations obtained 
by substituting these in (3.4) can be solved in least squares sense to give the values of w Q 
and Wi. 

For determining values of m and c for each pixel, the following method is used. Denoting 
the average over N observations with an overbar, we have 


Vi = mxi + c. + 7)i 


(3.5) 


Since r/j is assumed to be normally distributed as jV(0,of), yt is normally distributed with 
parameters 


E[Vi. 


= mxi + c 


V'N = v[ni\ = 


v\ 


N 


N 


Consider a neighborhood centered at the current pixel. Assume that the neighborhood is 
small enough so that the incident illumination x t remains approximately constant across it, 
but large enough so that the constituent pixels’ nonuniformities will average out to provide 
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us with a good measure of the local incident illumination. Denoting the average value over 
this neighborhood by the operator we have 

(him] = lisle] + // s ['/o] 

Hs[Vi] = lia[m]xi + // s [c] + Hs[>li\ 

giving 

_ p»[vi\ - i i *[yo] - Hsfo) + p,M 

;li /i.H 

The noise terms rji, are assumed to be independent and distributed as iV(0, of). We also 
assume that these terms are uncorrelated in space, i.e. that the values r/, for any pixel are 
independent of the values for the pixel’s neighbors. Then 

,«,[%] ~ J V (o, ^ ) , 

where N s is the number of pixels in the averaging neighborhood. Because N s will be on the 
order of 50-100, the terms li s [Vi] and fi s [yo] will be distributed very close to zero, and we 
may discard them in the derivation. 

Note that only the relations between the various illumination levels Xi are important, and 
not their absolute values. We may therefore choose any convenient scale for our estimating 
X{. Setting f.i s [m\ equal to 1, we obtain an estimate of X{ given by: 

Xi = HslVi] ~ Vs{yo] (3.6) 

From 3.5 and 3.6 we get 

Vi - Xi = (m - 1 + c + f) z - 
We define the following variables: 

2o = c - n s [yo], 

Z\ = m — 1. 

After substitutions, we obtain the expression 

Vi ~ lh[Vi] = Z\Xi + 2o + Vi- 

We may now express the behavior of any particular pixel across various levels of incident 
illumination with the linear system 

y = Az + C, 
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where 


y = [(.Vo - /<*[vo]), (;vi - n«[y i]), • • • , (■ y n - ti,[v n ])] T , 


A = 



1 


1 


T 

= ( a 0 0\ ■■■ a n ) T , 


* 1^01 


C = [m ■ • • Vnf 


We may also assume that the noise terms rji are uncorrelated in time, so that the error 
C = {y — Az ) is distributed as A r (0, R ), where 

R = jj diag(a 0 2 , a\, ■■ -al). 

Therefore, we can apply least-squares methods to estimate z, as well as its covariance P. 


3.4 Results of noise estimation 

In this section, we describe the results of estimating the spatial and the temporal noises. 
The spatial noise can be reduced by using the estimates of its parameters for every pixel, 
to compensate it. However, the temporal noise varies from frame to frame, and therefore 
cannot be reduced by such a method. 

3.4.1 Estimation and correction of spatial noise 

In this section, results showing significant reduction of pattern noise in time-averaged images 
are presented. The model’s lens-independence is demonstrated. Failure of the model to 
reduce noise in individual images is also noted. 

CCDs are universally reported to be extremely linear devices. Indeed, the basic assump- 
tion of our camera model above is that, each pixel operates in essentially a linear fashion. 
In order to test that assumption, we conducted preliminary experiments to determine the 
camera’s response at different flat-field illumination levels. Results of these experiments are 
shown in Figure 3.2, where each point represents a mean value of the central 1000 x 1000 
pixels of the array, and Figure 3.3, where the mean dark value has been subtracted. We 
varied the illumination level by changing the lens aperture, and assumed for the purpose of 
these experiments that the illumination thereby changed in perfect powers of 2. Therefore, 
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Linearity ot sensor response 



Figure 3.2: Camera’s response at different flat-field illumination levels. Each point represents 
a mean value of the central 1000 x 1000 pixels of the array. 


Linearity ot sensor response: avg dark level subtracted 



Figure 3.3: Camera’s response at different flat-field illumination levels, after subtracting the 
mean dark value. Each point represents a mean value of the central 1000 x 1000 pixels of 
the array. 
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Photon transfer curve for exposure 200 (13.1 x 10" 3 tec) 



Figure 3.4: The plot of the signal variance against the signal mean. 


the largest error in these plots is in the ordinate (illumination level), and not in the abscissa 
(camera response). 

For convention’s sake, we also present a rudimentary photon transfer curve in Fig- 
ures 3.4 and 3.5. This curve, a plot of signal mean against signal variance, is the most 
commonly used CCD performance curve. Generally, three areas should be discernible, cor- 
responding to the noise process that predominates in each section. To the left, the curve 
is theoretically flat, as readout noise predominates at low levels. In the center, the curve 
has a slope of 1, as photon shot noise is the predominant noise process here. On the right, 
the curve has a slope of 2, corresponding to pixel nonuniformity noise (i.e., fixed pattern 
noise). Of these three noise processes, of course, only the last is in any way deterministic 
and compensable. The point where the second and third sections meet represents the signal 
level at which fixed pattern noise limits the camera’s sensitivity. In these terms, the object of 
this research is essentially to move this point to the right. Our final report will include more 
refined versions of this curve, updated to incorporate the large amounts of data obtained 
since these plots were generated. 

Our primary lens was the Nikon 58mm Micro-Nikkor, and its response to a high-level 
flat-field illumination at apertures of f/4 and f/16 is shown in Figures 3.6and 3.7. (Unless 
noted otherwise, we use examples taken at high levels of illumination throughout this section. 
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Photon transfer curve for exposure 200 (13.1 x 10 sec): avg. dark value subtracted 



Figure 3.5: The plot of the signal variance against the signal mean after subtracting the 
mean dark value. 


Such images contain the highest noise levels, and also the greatest proport ion of compensable 
noise.) The reduction in lens response as one moves away from the image center is clearly 
evident. More puzzling to us is the fact that the response becomes more non-uniform as 
the aperture becomes smaller, as lenses are generally assumed to perform better at smaller 
apertures. This effect is illustrated in Figure 3.8 by taking a cross-section of the image (i.e., 
the center row) at different apertures. Efforts to consult with a local optical expert in order 
to explain this effect are ongoing. 

Figure 3.9 shows cross-sectional results at aperture f/4 for several different runs of images 
(collected as described in section 5 above, over a period of days or weeks), each at a different 
level of incident illumination. In Figure 3.10, these curves are normalized to correspond to 
Run #1 at the central pixel. One can see that errors as great as 1 % as evident on the left side 
of the image. As we would expect the lens response at any one point to be perfectly linear 
with respect to changing illumination levels, we must assume that these errors represent a 
flaw in our flat-field illumination. One possible cause is that (with reference to Figure 3.1) 
our tests were conducted in a room with white walls, and reflections from the room onto 
the acetate and thereby into the camera may have corrupted our results. Note, however, 
that the errors are generally much smaller than 1 %, and that the largest normalization 
factor is nearly 2. Overall, then, this figure demonstrates that our flat-field setup is quite 
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Effect of aperture setting on local illumination level (center row, exposure 200) 



Figure 3.8: The cross section of the lens response at the center row, showing plots of the 
normalized gray level, for a number of aperture settings. 


consistent. We would expect that results could be further improved by posing a black matte 
shield around the camera to keep out reflections. 

One of the runs was arbitrarily selected, and the procedure described in Section 6 above 
was used to develop a camera noise model. Figure 3.11 shows the result of applying that 
model to the image of the highest-level set in this run. Essentially, then, we are applying 
the noise model to itself here. Therefore, this image represents the limit to the amount of 
noise reduction we can expect. Assuming that the image presented by the lens is a smoother 
version of the final curve, we can see that a very small level of noise can be expected. 

Figure 3.12 shows the result of applying the noise model to the high-level set image of a 
different run (i.e., one taken on a different day). It is immediately obvious that our model 
gives a significant degree' of improvement. 

Figure 3.13 shows the model as applied to a low-level set image of a different run. The 
level of improvement is much reduced, as the level of compensable (i.e., deterministic) noise 
in such an image is negligible. 

As our model is specific to the array, and not to any particular lens, we also tested 
its application to set images from a different lens. Figures 3.14 and 3.15 show the flat- 
field response of a Fujinon zoom lens set to 12.5 mm at apertures of f/5.6 and f/16. It is 
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Incident illumination levels at aperture f.'4 (center row, exposure 200) 



column number 


Figure 3.9: Cross sectional results at aperture of f/4 for several different runs of images, each 
at a different level of illumination. 


Normalized illumination levels at aperture f/4 {center row, exposure 200) 



column number 


Figure 3.10: Cross sectional results at aperture of f/4 for several different runs of images, 
each at a different level of illumination, normalized with respect to Run #1 at the central 
pixel. 


27 




218 


Application of noise model to constituent image (center row, exposure 200) 



Figure 3.11: Result of applying the camera noise model to an image from a high-level set. 


Application of noise model to non-constituent image (center row, exposure 200) 



column number 


Figure 3.12: Result of applying the camera noise model to an image from a high-level set of 
a different run. 
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Application ol noise model to new image (center row, exposure 200) 



Figure 3.13: Result of applying the camera noise model to an image from a low-level set of 
a different run. 


immediately apparent that the response of this lens is not so precise as that of the Nikon 
lens, perhaps due to the mechanical compromises necessary for the zoom operation. 

Figure 3.16 shows the result of applying our model to a high-level set from this lens. 
Comparison to Figure 3.12 shows that many noise spikes occur in the same location, as 
would be expected. However, the general shape of the curve shows the same dip to the right, 
suggesting again that our flat-field is slightly flawed. Improvement does not quite reach the 
level of that in Figure 3.12, but is excellent nonetheless. 

Finally, we note that in all of the results above, the noise model was applied not to 
individually captured images, but rather to the mean image taken from 100 individual images. 
Therefore, temporal (nondeterministic) noise was significantly reduced. In Figure 3.17, we 
applied the noise model to an individual image. One can see that our model had little 
effect beyond smoothing the quantized levels. The results above clearly demonstrate that 
for a set of time-averaged images, our model can offer a significant reduction in camera- 
generated noise. Unfortunately, for individually captured images, temporal noise processes 
predominate which cannot be removed. 
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Application of noise model to image from different lens 



column number 


Figure 3.16: Result of applying our model to a high-level set from the Fujinon lens. 


Application of noise model to single image (center row, exposure 200) 



column number 


Figure 3.17: Result of applying our model on an individual image, instead of the mean of 
images. The model has little effect beyond smoothing the quantized levels. 
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3.4.2 Estimation of temporal noise 

To estimate the temporal noise in the camera images, we collected runs of sets of images. 
Each set contained 100 images (i.e. N = 100). collected at a rate greater than 28 fps (i.e. over 
a period of less than 4 seconds). In order to verify that the illumination level was constant 
across this period, we took the image-wide mean of the central 1000 x 1000 pixel region of 
each image, and found the difference between the means of the brightest and darkest images 
in the set (relative to the mean of the darkest image). For brightlv-lit images (i.e. mean 
gray level from 150 to 255), the maximum such difference we accepted was 1.02%; most 
differences were less than 0.5%. At lower levels of illumination, the increasing proportion of 
temporal noise caused this difference to rise, but even at the lowest illumination levels we 
accepted no runs with differences greater than 5%. Each run comprised three to eight sets 
of images, taken at progressively narrower aperture settings. We collected a run within as 
short a time as possible, to permit us to assume if necessary that the available illumination 
had remained constant (such an assumption is not necessary for the model described here). 

Each accepted set of images was later condensed into two floating-point arrays of size 
1024 x 1024, representing the sample mean and sample variance values at each pixel. 

For estimating the temporal noise parameters, imagewide means of these arrays are used 
as estimates of if, and V) in the equation: 

1 i — Wq + a; | E, (3.7) 

The equation is solved using least squares to obtain the paramters Wq and w , . The plot of 
\\ to Ei is shown in Fig. 3.18. Its slope is ui\ and the y-int.ercept is w Q . The values of these 
parameters obtained are: 

w 0 = 0.171, wi = 5.6 x 10” 3 (3.8) 

This corresponds to a noise variance of a 2 = 0.888 or standard deviation of a = 0.942 for 
the background value of 128. This value is used in experiments for testing target detection 
algorithms. 
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Chapter 4 
Future Work 


The following avenues of future work can be explored. The spatial frequency response of the 
optical system consisting of the lens and the camera, can be characterized using the concept 
of Modulation Transfer Function (MTF) as described below'. Also, the camera noise models 
can be validated using a statistical hypothesis test. 

4.1 Description of Modulation Transfer Function esti- 
mation 

The spatial frequency response of an optical system is commonly characterized with a plot of 
the system’s modulation transfer function (or MTF), which shows the normalized magnitude 
of the system’s response to a range of input frequencies. The MTF may be expressed as 
the Fourier transform of the system’s line spread function (LSF), which is the response to a 
flat-field containing a single sharp line. In other words, the LSF is a tw r o-dimensional analog 
to an impulse response. (For the purposes of this overview, w r e will ignore the fact that a 
CCD array responds differently in the row 7 and the column dimensions.) 

Traditionally, the LSF of a CCD is obtained by projecting a very narrow 7 band of light 
onto the array. In an SPIE paper, Lin and Chan describe a method of computing the MTF 
from the edge-spread function, which may be differentiated to obtain the LSF [4], The 
edge-spread function is obtained from a high-contrast target, making the measurement more 
flexible, more accurate, and less expensive than the traditional method of LSF measurement. 
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4.2 Noise model validation 


An important issue to be addressed as part of this research is the validity of the noise 
models used to generate synthetic images. In other words, one must answer the question of 
whether the computer generated images are indeed a set of representative images suitable 
for the performance characterization of the detection algorithms. This can be done by using 
a statistical hypothesis test described in [3]. 
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Part II 


Algorithms for Detecting Airborne Obstacles 


Abstract 


This part describes the approaches used for detecting airborne obstacles using 
image sequences obtained from a camera mounted on an aircraft. A number of basic 
algorithms were implemented for airborne obstacle detection. The performance of these 
algorithms was characterized in presence of camera noise using theoretical and 
experimental methods. Since the performance degrades in the presence of background 
clutter, a special approach to address the problem of hazard detection in presence of 
clutter was studied. This approach uses the differences in the behavior of translation and 
expansion of image features corresponding to the objects on a collision course and the 
background clutter. Algorithm fusion for combining different algorithms to overcome 
their individual limitations was also studied. In addition to this work on detecting objects 
on collision course, algorithms for detecting objects crossing the aircraft were designed 
and implemented on a real-time system. The image processing and tracking steps of the 
system are described in this part, whereas the hardware implementation is described in 
the next part. 
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Chapter 1 
Introduction 


Image sequence analysis has been widely used in computer vision. This part describes the 
use of image sequences for detection of airborne obstacles in the flight path of an aircraft . 

Continued advances in the fields of image processing and computer vision have raised 
interest in their suitability to aid pilots to detect, possible obstacles in their flight paths. 
For the last few years, NASA has been exploring the use of image sequences for detecting 
obstacles in the flight path of an aircraft. NASA Langley Research Center supported a 
project to enable pilots to ‘see through fog’ using Passive Milli-Meter Wave (PMMW) images 
of low resolution. For this project, Tang and Devadiga [12] from our group had developed 
methods to locate the runway and detect obstacles on and outside the runway. The resulting 
output can be used by the pilots to decide whether to land or not. 

Obstacle detection is also possible with visible-light image sequences. In the design of 
a High Speed Civil Transport. (HSCT) aircraft with a limited cockpit visibility, NASA has 
proposed a Synthetic Vision System (SVS) in which high resolution video images would 
be obtained using cameras mounted on the aircraft. These images can be used to detect 
obstacles in the flight path to warn the pilots and avoid collisions. For aircraft operations, 
both airborne obstacles, as well as the obstacles on the runway surface should be detected. 

Algorithms for detection of airborne objects from images are abundant in the published 
literature. A systematic performance characterization of a number of target detection al- 
gorithms was performed bv using image degradation models for digital cameras. It was 
observed that the algorithms that, were studied have a good performance on images which 
do not have background clutter. However, the performance degrades severely when back- 
ground clutter is present. Thus, the goal of this work has been to design algorithms which 
perform better in cluttered background environments, with low probabilities of false alarms 
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and mis-detections and capability of target, detection early enough to avoid a possible colli- 
sion. To achieve this goal, a special approach was used to discriminate hazardous objects on 
collision course from the background clutter. Algorithm fusion was studied for combining 
different algorithms in a statistical framework, to overcome their individual limitations. The 
performance of the fused algorithm was found to be better than the individual algorithms 
under appropriate conditions. 

This part of the report is organized as follows: Chapter 2 describes the basic, well-known 
algorithms used for detection of airborne obstacles. These algorithms were tested on real 
image sequences provided by NASA. In Chapter 3, the performance of these algorithms 
is experimentally characterized using the approach described by Kanungo et, al. [10]. The 
theoretical characterization of the algorithms’ performance is described in Chapter 4, and 
the experimental performance is compared with the theoretical performance. 

The main contribution of the research for the detection of hazardous objects is described 
in the next two chapters. A special approach is proposed for discrimination of objects 
on collision or near-collision course from background clutter. This approach is described in 
Chapter 5 where differences in the behavior of translation and expansion in the image are used 
to separate hazardous objects from clutter. Chapter 6 describes the Bayesian methodology 
used for combining detection algorithms in a statistical framework. Performance of fused 
algorithm is compared with that of the individual algorithms. 

In addition to hazardous objects, it is also useful to detect and track objects crossing 
in front of the aircraft. A real-time system using pipelined image processing hardware was 
designed for this purpose. Chapter 7 describes the image processing operations which are 
performed by the pipelined hardware, and the tracking operations performed on the host 
machine to form a complete real-time system. 

Chapter 8 concludes the part and explores avenues for future work. 
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Chapter 2 

Object Detection Algorithms 


This chapter describes the algorithms that were implemented to detect airborne obstacles 
in the flight path of a flying aircraft. Statistical theory used for target detection is first, 
described, followed by a number of basic steps useful for removing background clutter, am- 
plifying the signal to noise ratio, and detecting objects having different sizes and velocities. 
Results obtained bv using real image sequences are also described. 

2.1 Background 

NASA's need for enhanced capabilities in obstacle detection using image processing requires 
robust, reliable and fast techniques. These techniques should provide a high probability of 
detection while maintaining a low probability of false alarm in noisy, cluttered images of 
possible targets, exhibiting a wide range of complexities. The size of the image target can be 
quite small, from sub-pixel to a few pixels in size. As an example, consider a Cessna aircraft 
that has a length and wing-span of approximately 9 m (30 ft) and the fuselage diameter of 
approximately 1.2 m (4 ft) [19]. The detection algorithm must be capable of detecting this 
small target at least 25 seconds prior to a possible collision to allow for corrective actions 
by the pilot. Assuming that both the aircraft are traveling at 125 m/s (250 knots), their 
relative velocity can be as high as 250 m/s (500 knots). In such case, they would be 6.25 km 
(3.5 nautical miles) apart 25 seconds before collision. Using a camera with a resolution of 
60 pixels per degree, the image size of the aircraft is 5.0 x 0.7 pixels from a side view, but 
only 0.7 x 0.7 pixels from a front view. Furthermore, the detection algorithm must report 
such targets in a timely fashion, imposing severe constraints on their execution time. Finally, 
the system must not only work well under the controlled conditions found in a laboratory 
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and with data closely matching the hypothesis used in the design process, but it must be 
insensitive - i.e., must be robust - to data uncertainty due to various sources, including 
sensor noise, weather conditions, and cluttered backgrounds. 

Extensive work has been done on the problem of target detection. When the signal to 
noise ratio is low, it is preferable to use the ‘track before detect’ approach. In this approach, 
an object is tracked over multiple frames before making a hard decision on the presence or 
absence of a target . The simplest way to integrate the input images over multiple frames 
is by temporally averaging them. When the image motion of the object is very small, as 
in the case of an object being exactly on a collision course [14], this happens to be the 
best approach. However, if the object has a significant image motion, other approaches are 
needed. Nishiguchi et, al. [16] proposed the use of a recursive algorithm to integrate multiple 
frames while accounting for small object motion. A dynamic programming approach was 
used by Barniv [4] and Arnold et al. [2] to detect moving objects of small size. The theoretical 
performance of this approach was characterized by Tonissen and Evans [18]. 

The above algorithms perform well when the background is uniform. However, in real 
situations the hazardous object should be detected not only against uniform background, 
but also against backgrounds such as clouds, ground or water. The features introduced due 
to a non-uniform background which interfere with object, detection are collectively known 
as clutter. Thus, the objective of the detection algorithms is to successfully detect the 
hazardous object, without giving unnecessary false alarms from clutter. Subtraction of 
consecutive images is often used to remove stationary clutter. However, an object on a 
collision course could be nearly stationary in the image [14]. Hence, this method is not useful 
for our application, since it could remove the object as well. Alternatively, morphological 
filtering [6] removes objects of large size, usually corresponding to clutter while retaining the 
objects of small size. This approach is useful in removing large clutter, such as clouds. But 
it does not remove small-sized clutter. 

2.2 Statistical decision theory for target detection 

Statistical decision theory [13, 17] can be used to design optimal or near-optimal detection 
algorithms, as well as to characterize their performance. The input to the algorithm is a 
sequence of images, each composed of a large number of individual pixels. These pixels are 
degraded by various sources, such as atmosphere, lens, and camera noise. Based on the 
statistical behavior of this degradation, the image pixels can be combined in space and time. 
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to make statistically optimal decision about the presence or absence of a target. For making 
these decisions, probabilistic models of the signal and its degradation can be used. 

Let H 0 and H i denote the hypotheses that the target is absent or present, respectively, 
and P(H 0 ) and P{H\) denote their respective prior probabilities. Let 2 represent the vector 
of observations from which one is supposed to determine the presence or absence of a target . 
By Bayes’ rule, the posterior probabilities are given by: 


P(H ] \z) = 


pjz jWi) 

p(z) 

The ratio of these probabilities is given by: 


, P(H 0 \z) 


pjz\Ho)p{Ho) 

P{z) 


( 2 . 1 ) 


p(//,iz) p{hmz\h,) pm . v r99 , 

P(H 0 \z) P(H 0 )p(z\Ho) P{Ho) H ^ ’ 

where L H (z ) proportional to the ratio of the probabilities is called the likelihood ratio. 

When the algorithm reports a target even where there actually is none, it is called a 
false alarm, whereas when it does not report an existing target, it is called a mis-detection. 
The performance of a detection algorithm is characterized in terms of false alarms and mis- 
detections. According to the Neyman Pearson criterion [13, 17], the number of mis-detections 
for a given rate of false alarms can be minimized by thresholding the likelihood ratio L H (z). 
The threshold is a function of the required rate of false alarms. In place of the likelihood 
ratio, any of its monotonic function (such as the logarithm) can be used. Such a function is 
called a discriminant function. 

To decrease the probabilities of false alarms and mis-detections, one can integrate ob- 
servations spatially or temporally. Let the N elements z u z 2 ...z N of z be independent 

observations. The likelihood ratio and its logarithm (log likelihood ratio) are given bv: 


T t \ p(~l> ^2 • • • ^n\Hi) T-r p{Zi\H] ) 

p(z u z 2 ...z N \Ho) ~}Jip(z i \H 0 ) 
l(z) =log L h (z) = [logp(2i|P 1 ) - logp(*j|//o)] 


(2.3) 


(2.4) 


7 = 1 


In the case of zf s having normal distributions in absence and presence of target, such that 
their probability density functions are: 


p(zi\H 0 ) 


1 
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The log likelihood ratio is given by: 


l{z) = log L h (z) = 

7=1 


v (z, - p) 2 + zf p rN 


2a 2 


E* 
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This is a monotonic function of Hence, thresholding the sum (or mean) of the obser- 
vations yields an optimal detector. Since sum and mean are linear functions, they are also 
normally distributed. 

Consider a discriminant function which is normally distributed in absence and presence 
of target as iV(// 0 , CTq) and rr'(), respectively with equal variances a\ = erf but unequal 
means /i 0 and pi- If this function is thresholded to obtain a particular false alarm rate, it 
can be shown that the corresponding mis-detection rate is a function of its Signal to Noise 
Ratio (SNR) given by (//i — // 0 )/er o- Hence in this case, the performance in terms of false 
alarm and mis-detection rates is determined by the SNR. 

If N independent normal observations are made, their sum is distributed as iV(0, Ncr 2 ) 
in absence of target, and N(N //, Na' 2 ) in presence of target. Hence, the SNR is given by 
Nfi/v No 2 = y/Nfi/o i.e., amplified by a factor of y/N. In other words, a signal with 
SNR of S/ y/N integrated over N frames could yield the same rate of false alarms and mis- 
detections as one would get using a single observation with SNR of S. Hence, the SNR 
required for detection reduces by vN when N frames are added. The same result is true for 
averaging of N frames, since the signal as well as the noise would be reduced by a factor of 
N. 


2.3 Pre-processing 

Before any other algorithms can be applied, pre-processing should be performed on the input 
images to suppress the background. The following approaches were used for pre-processing 
the images. 

2.3.1 Low-stop filter 

In the case of an image with little or no clutter, a low-stop filter which subtracts from 
every pixel, the local average of the neighborhood of that pixel effectively suppresses the 
background intensity. This filter can be implemented by convolving the image with a 2-D 
mask corresponding to the filter. Since the amount of computation increases with the mask 
size, a small sized mask was used in conjunction with the pyramid approach described in 
Section 2.4 to simulate the effect of a large sized mask. 
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2.3.2 Morphological filter 


If the background has significant clutter, the low-stop filter is not as effective for removing it. 
A morphological filter [6] can remove large sized features (usually clutter), while retaining 
small sized features (usually targets). 

The gray-scale morphological operations of dilation (©) and erosion (©) are defined as: 
(/ © m)(x, y) = max {f{x - x', y - y') + rn(x', y')} (2.7) 

(r,y )Oh 

(/ © m)(.r, y) = min {f(x + x',y + y')-rn(x',y')} (2.8) 

{x f 

where m is the mask using which the morphological operation is performed, and / is the 
image which is considered to have a default value of -oc outside its domain. Morphological 
closing and opening can be defined using the above operations as: 

if • m) = (/ © in) © m > f (2.9) 

(/ o m) = (/ © m) © m < f (2.10) 

A difference between the original image and its morphological opening, known as the top- 
hat transform outputs small-sized positive targets i.e., bright targets in dark background. 
On the other hand, the difference between the morphological closing and the original image, 
known as the bottom-hat transform outputs negative targets i.e., dark targets in bright 
background. Each of these images are non-negative, and can be separately used to detect 
targets. 

A single mask for these morphological operations gives undesirable outputs for jagged 
boundaries of large features. Hence, horizontal mask m x and vertical mask rn y were used 
separately as proposed by [6]. These masks are of length 5 with origin at the center of the 
mask, with all the pixels having the default value of zero. The outputs are given by: 

F+ = F — max { F o m x , F o m y } 

F_ = — F + min{F • m x , F • m y } 

2.4 Spatial integration 

To detect targets of a number of different sizes and velocities, and to amplify the SNR , the 
target pixels in a given image can be integrated by forming an image pyramid. For this 
purpose, the following basic operations are used: 


( 2 . 11 ) 

( 2 . 12 ) 



1. Low-pass filter ( LP or LP): Convolves the image in x and y directions with the masks 
m x = m y = [1,3, (3), l]/8, or their mirror images. The parentheses denote the origins 
of the masks. 


hp{x, y) = Y.Y, f( x ~ J ' , ’ y - v') m x( x ') m y (y') 

x* y' 

fnp( x iv) = ^J2f( x + x 'pj + y') m x( x ') m y(y') (2.13) 

x' y' 

2. Down-sampler ( DS ): Selects even numbered pixels in the input image to give an image 
with half the resolution. 

fDs(x,y) = f{2x,2y) (2.14) 

3. Up- sampler (US): Forms the output image by putting the input image pixels in even 
numbered positions, and zeros in odd numbered positions. The image is scaled by 2 to 
maintain the image intensity during subsequent low-pass filter step. 

fus(x, y) = 2f(x/2,y/2) when x, y are even: 0 otherwise (2.15) 

These steps are combined to form two types of operations: 

1. Low-pass down-sample operation (LP — » DS): Decreases the resolution of the image 
by two. Low-pass filter prevents aliasing of high frequencies in the image by suppressing 
them. 

2. Up-sample low-pass operation (US — ► LP): Increases the resolution of the image by 
two. Low-pass filter smoothes the output of the up-sampler (containing zeros at odd 
pixels) to produce the effect of interpolation. In this case, the mirror image masks are 
used to compensate the asymmetry in the masks. 

The above operations can be used to combine pyramid formation with low stop or mor- 
phological filtering by using the system shown in Figure 2.1. Images pyr[i] are formed by 
successively applying low-pass and down-sample operations on the original image. These 
images can be directly used as inputs to the morphological filter to detect targets at differ- 
ent resolutions. Images pyr'[i] are formed by successively applying up-sample and low-pass 
operations to the lowest resolution image pyr[n ], where n is the number of pyramid levels. 
These operations remove the high frequency components of the original image. Low-stop 
filtered images are given by ls[i] - pyr\i ] - pyr'\i ], and retain only the higher frequency 
components not subtracted out by pyr'[i}. 
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Figure 2.1: Spatial integration using pyramid construction: LP or LP: low-pass filtering 
with original mask or its mirror image, DS: down-sampling, US: up-sampling. The pyramid 
images at stage i = 0 . . . n are denoted by pyr[i\. Low-stop filtered images are obtained the 
by subtracting the corresponding up-sampled pyramid outputs pyr'\i] from pyr[i] and are 
denoted by ls[i]. 


In this way, a hierarchy of images, each with half the resolution of the previous one is 
formed. The size as well as the velocity of the object in the image scales as the resolution is 
lowered. There is a particular resolution at which the object occupies no more than 2 to 3 
pixels in length and width, which would be optimal for detection of the object. 

2.5 Temporal integration 

As shown in Section 2.2, integration of pixels corresponding to a target results in amplifi- 
cation of the target SNR, and increased reliability of detection. Depending on the image 
motion of the target, the following approaches can be used for integration of target pix- 
els over a number of image frames. The performance of these approaches is characterized 
experimentally and theoretically in Chapters 3 and 4, respectively. 

2.5.1 Recursive temporal averaging 

In the case of objects on a collision course [14] the image motion is very small. Hence, pixel 
wise temporal averaging of a sequence of images would improve the detection performance. 
However, direct use of temporal averaging results in infinite memory. To give a higher weight 
to more recent observations, a recursive filter can be used. The output F(k) at time k for 
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any pixel is recursively obtained from the input f(k) at the same pixel using the following 
steps: 

1. Initialization: F(0) = 0 

2. Recursion: F(k ) = f(k) + oF(k - 1) 

where a is a forgetting factor between 0 (full forgetting) and 1 (no forgetting). 

2.5.2 Dynamic programming 

In the case of moving targets, the temporal averaging filter does not improve the detection. 
A dynamic programming algorithm [2] is more effective in detection of moving targets. The 
algorithm is based on shifting the images before averaging them so as to align the target to 
be detected. Since the velocity of the target could be arbitrary, the velocity space (il v) is 
discretized within the range of possible target velocities. A set of intermediate images F . 

each corresponding to a particular velocity (u.v), are created recursively using the following 
steps: 

1. Initialization: For all pixels (:r, y) and all velocities (u, ?;), set 

F{x,y:u,v\ 0) = 0 (2.16) 

2. Recursion: At time k , set 

F(x, y; it, v; k) = f(x, y; k) + a max F(x - u - x\ y-v- y'\ u. v: k - 1 ) (2 17) 

h'.y)eQ ' ’ ' 

where 

Q = {O', V')W,nin < S' < <nax , Unnn < V < V'max} (2-18) 

3. Termination: At time I \ , take 

Fmax{%, V\ A') = max Fix, y; u, v: K) (2 19) 

w r here 

P = {(«, v)\u imn < u < u max , v rnm < v < v max } (2.20) 

The maximum operation in the recursion step is performed using the set Q , which ensures 
that the targets with velocities which do not fall on the grid are not missed. The set of 
discretized velocities denoted by P determines the range of target velocities that can be 
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detected by the algorithm. The final maximum in the termination step combines the targets 
corresponding to all the velocities. The number of elements in P and Q are denoted by p 
and q, respectively. 

In the recursion step, a maximum is taken over q pixels. If these pixels are all noise 
pixels, they are more likely to give a false alarm if q is large. Thus, the rate of false alarms 
increases with q. To get better performance, a smallest possible q should be used. The value 
of q = 4 has been used in our experiments corresponding to a 2 x 2 neighborhood, given by: 

Q = {(0, 0), (-1,0), (0, -1), (— 1, -1)} (2.21) 

This ensures that the targets having fractional velocities are not missed. The asymmetry in 
this neighborhood is compensated by choosing u min = u max - 1 and v min = - 1 . For 

the case of u max = v max = 1, p = 4 and P is given bv: 

P= {(0,0), (1,0). (0,1), (1,1)} (2.22) 

The algorithm then detects targets with a maximum velocity of 1 pixel per frame. However, 
when spatial integration is performed prior to dynamic programming, targets with larger 
sizes and velocities can be detected. 

On the other hand, if P - Q = {(0,0)} so that p - q = 1, the algorithm reduces 
to recursive temporal averaging, which gives the best performance for stationary targets. 
However, the performance of temporal averaging sharply degrades if the target, is moving, 
whereas that of dynamic programming algorithm does not. 

The output of the dynamic programming algorithm is an image, with large values at 
positions where the target strength is high. However, the pixels in the neighborhood of the 
target will also have a significantly large value. This can be resolved by using noil-maximal 
suppression, where the output is smoothed using a Gaussian filter with a — 1.0, and each 
pixel which is not a local maximum in its 3x3 region is set to zero. After this, only the pixels 
which are local maxima remain, which can be thresholded to obtain the target locations. 

It should be noted that separate processing should be performed if the targets are negative 
i.e., dark targets on a bright background. In the case of low-stop pre-processing, this is 
done by using the negative of the pre-processed image, whereas in the case of morphological 
pre-processing, both original minus open and closed minus original images are processed 
separately. 
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2.6 Composite system 


The above mentioned algorithms have been combined to form a composite system for target 
detection. The steps that form this composite system are: 

1. Temporal Averaging: This step is performed first, in the case of objects in a uniform 
background, having a very small image motion, such as those on a collision or near- 
collision course. In such a case, temporal averaging improves the SNR and reduces the 
processing rate required for subsequent steps. 

2. Pyramid construction with low-stop or morphological filtering: In this step, a pyramid 
is constructed to accommodate different sizes and velocities of objects. For pre- 
processing the images, low-stop or morphological filtering is performed at each pyramid 
level to remove background intensity. Low-stop filtering is more effective in low clutter 
situations, whereas morphological filtering [6] is more effective in suppressing back- 
ground clutter due to clouds and ground. 

3. Dynamic Programming: A dynamic programming algorithm [2] is performed on pre- 
processed frames to integrate the signal over a number of frames by taking the target 
motion into consideration. Non-maximal suppression and thresholding are then per- 
formed on the output. 

It should be noted that one or more of these steps can be bypassed so that any of the basic 
algorithms described above can be tested individually using the same system. 

2.7 Results using analog camera 

The above target detection algorithms were applied to real image sequences obtained from 
NASA. Figure 2.2 (a) shows an image from the sequence with the target aircraft flying away 
from the host aircraft. The sequence can be played in reverse to simulate the aircraft on a 
collision course. Since the aircraft on a collision course have a small image motion, temporal 
averaging was the optimal detection algorithm in this particular case. The aircraft was at. 
a distance of approximately 4 nautical miles (7.4 km), and was barely visible in a single 
image. Low-stop filter was applied before temporal averaging to remove the near-uniform 
background. After temporally averaging and thresholding, the aircraft was detect ed as shown 
in Figure 2.2 (b). Dynamic programming algorithm was performed on a sequence of images 
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(b) 


Figure 2.2: Target detection using temporal averaging: (a) Original image with a distant 
contracting target at 4 nautical miles. The target is approximately in the middle of the 
image. However, due to degradation of image quality, it is very faint, (b) Detection of 
the distant contracting target using low-stop filter pre-processing, temporal averaging and 
thresholding. A false alarm in the mid-left area is most likely due to a smudge on the camera. 

(after applying low-stop filter as pre-processing) in which an aircraft was flying from right to 
left across the image as shown in F igure 2.3 (a). Dynamic programming algorithm detected 
the aircraft with a low rate of false alarms. However, the target was dilated by the use of this 
algorithm. Clutter removal using morphological filtering was also explored. Figure 2.4 (a) 
shows a small aircraft flying in the middle-right part of the image. The image was actually 
obtained by averaging 10 motion compensated images from an image sequence, in which an 
aircraft was flying on the collision course. Application of morphological filter removed most 
of the clutter due to edges of large-sized features. This aircraft which was on a collision 
course, was retained. However, other small-sized features were also retained, resulting in a 
number of false alarms. The result is shown in Figure 2.4 (b). 

Chapter 3 presents a systematic performance characterization for temporal averaging as 
well as dynamic programming using statistical image models for digital cameras. It was 
observed that the algorithms performed very well when the background was clear. However, 
the performance degraded severely in presence of clutter. In the case of cluttered images, 
pre-processing using morphological filter worked better than that using low-stop filter. Most 
of the clutter was removed, but small sized clutter, especially due to specular reflect ion from 
water remained. Finally it was observed that the number of false alarms after applying the 


13 



Figure 2.3: Target detection using dynamic programming: (a) Original image frame with a 
translating target, (b) Location of the detected target using dynamic programming (following 
a low-stop pre-processing step). 



Figure 2.4: Detection using morphological processing: (a) An average of ten motion- 

compensated frames of an image sequence. The aircraft is in the middle-right part of the 
image, (b) Detection using morphological filter. False alarms due to other features are also 




algorithm, in general, was reduced but still significant. 


2.8 Data collection using digital camera 

The real data that was used for our previous work was captured using an analog camera 
and recorded using NTSC video, thus containing additional noise that should not be present 
when a digital camera is used on the actual flight. Hence, the performance of tin* algorithms 
should be characterized without the undue interference from video noise. For this purpose, 
a system w r as designed to capture image sequences from an aircraft using a digital camera, 
and record them digitally on a disk. The camera used was 1I< x 1A' Kodak MegaPlus ESl.O 
camera with the output at approximately 30 frames per second and a gray scale resolution 
of 8 bits. Hence, a bandwidth of 30 MBytes per second and a storage of 108 GBytes per 
hour of recording is required. 

To capture the video image sequences with these large bandwidth and storage require- 
ments, as well as perform the image processing operations in real time, a real-time image 
processing system with pipelined image processor called DataCube MaxPCI was procured. 
This system is a cost-effective way to meet high-throughput low-latency demands and has 
become popular among researchers working on real-time vision problems. The New Technol- 
ogy Disk (NTD) available with the DataCube MaxPCI has the required ability to perform 
high-speed digital image recording. NTD is a Redundant Array of Inexpensive Disks (RAID) 
that enables high-speed lossless digital image recording and playback. The image data can 
be recorded and played back at a real-time frame rate (overall 40 MBytes /sec). 

Image data has been obtained from flight tests conducted at NASA Langley Research 
Center. A sample image captured using this system is shown in Figure 2.5. Work on 
implementing the detection algorithms on the DataCube hardware using these images is 
described in [11], Detection of objects crossing the aircraft (instead of those on a collision 
course) was performed on the DataCube system in real time. The algorithms used for this 
purpose are described in Chapter 7. 
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Figure 2.5: An image captured from an aircraft using the digital recording system. The 
target aircraft is in the middle-right part of the image. 
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Chapter 3 


Performance Characterization of 
Detection Algorithms 


The most common tool used to characterize the performance of a detection algorithm is a 
plot of its probability of mis-detection versus its probability of false alarm, as some tuning 
parameter is changed. This plot is commonly known as the “receiver operating curve” of the 
system, or ROC, for short. Although ROCs are useful to represent the system performance 
as a parameter is varied, they have several limitations. One disadvantage in using ROCs 
is due to the fact that only one parameter can be varied at a time. Thus, if the effect of 
variations of multiple variables needs to be studied, a different curve must be determined 
for each of these variables making the analysis of the system performance more difficult. A 
second disadvantage is that it is difficult to compare ROCs for different algorithms since they 
may take different variables into account. Finally, obtaining ROCs is an expensive process 
where factorial experiments must be carried out to determine the system performance at all 
performance levels with the probability of false alarms ranging from zero to one. 

In Kanungo et, al. [10], a methodology which was adapted from the psychology literature, 
and is discussed next, was proposed as an alternative characterization tool to summarize mul- 
tiple ROCs into a single curve, solving the problems described above. This chapter describes 
how to use this methodology to characterize the performances of the algorithms described 
in Chapter 2. The performance of the dynamic programming algorithm is compared against 
that of temporal averaging, and thresholding of a single image frame. 
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Figure 3.1: Steps for performance characterization: (a) Step 1: Obtain the frequency dis- 
tributions of the evidence measure for images with and without target, (b) Step 2: Obtain 
the ROC. (c) Step 3: Determine the optimal operating point using either the expected cost 
or the probability of detection given the probability of false alarm, (d) Step 4: Plot the 
threshold value corresponding to the optimal operating point versus a variable of interest. 


3.1 Performance characterization methodology 

For the sake of completeness, the methodology for performance characterization proposed 
in [10] is described here. Consider a detection algorithm that must report whether a given 
image has a target or not. Typically, the algorithm would compute some measure of evidence 
of target presence arid compare it to some given threshold value. Whenever the evidence 
measure is greater than the given threshold, a target would be reported. The performance 
of the algorithm is affected by several factors, such as image contrast, target size, complexity 
of the background, etc. The effect of variations of these variables on the overall performance 
can be measured through the use of equivalent effects of some critical signal variable by 
following the four steps described below. 

1- Obta in evidence distributions: The first step consists on estimating distributions of 
evidence measures, one for images with target and another for images without target, 
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as illustrated in Figure 3.1 (a). This estimation is done non-parametrically by randomly 
presenting the algorithm with images of both types and recording the frequency of the 
evidence measure values reported by the algorithm, using a histogram. It should be 
noted that the frequency distributions are used here only for estimating the false alarm 
and mis-detection rates. The evidence measure which is thresholded may or may not 
be derived from these distributions according to Baves’ rule. Hence, the performance 
of optimal as well as non-optimal detectors can be characterized by this approach. 


2. Obtain ROCs: The second step consists on constructing an ROC as the one shown in 
Figure 3.1 (b) by varying the threshold used by the algorithm to compare against the 
computed evidence measure. False alarms occur when a pixel in the given image does 
not contain a target, but the evidence measure is greater than the threshold being 
used. Mis-detections occur when the given image contains a target, but the evidence 
measure is less than the threshold. The probabilities of false alarms and mis-detections 
can be approximated by their frequency ratios: 


Number of false alarms 
Total number of input pixels without target 
Number of mis-detections 
Total number of targets in input images 

where H 0 and denote the hypotheses corresponding to the absence and presence’ of 
a target, respectively. 


P(MD ) = P(H 0 \H X ) 


P(FA) = P(H X \H 0 ) 


3. Determining the optimal operating point: The optimal operating point (or its corre- 
sponding threshold value) can be specified in different ways, depending on how much 
prior knowledge is available. If the prior probabilities and costs are known, the optimal 
operating point can be defined as the one minimizing the expected cost. Let C" 10- Gn , 
C u, and Coo, be the costs of a false alarm, a mis-detection, a correct detection, and a 
correct rejection, respectively. The expected cost is then given by: 

E[C] = [P(H 0 \H 0 )Coo + Pm\H 0 )C l0 \P(H 0 ) 

+ [P(Ho\Hi)Coi + P{Hq\H])Cu\P{H\) (3.1) 

The optimal operating point is found by minimizing E[C) with respect to the threshold 
to be used by the algorithm. In the most likely case when the costs are difficult to set, 
an alternative way to define the required operating point is to use the Nevman-Pearson 
criterion -- i.e., to maximize the probability of detection for a given probability of false 
alarm. 
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Independently of which definition is used, the optimal operating point depends on the 
signal to noise ratio (SNR) in the input image. For example, increasing the target 
contrast results in an increase of the SNR and, hopefully, in an improvement of the 
algorithm performance for a given threshold value. The optimal operating points for 
different SNRs can be found by repeating steps 1 and 2 for the corresponding SNR 
values and determining the optimal point for each of the resulting ROCs. Once this 
is done, a graph of the expected cost or the probability of detection versus SNR can 
be plotted, depending on which definition of operating point is being used. This is 
illustrated in Figure 3.1(c). Finally, let SNR T and T be the SNR and the associated 
threshold values for the optimal operating point for a given level of performance, as 
shown in the figure. The level of performance is specified by either a desired expected 
cost of classification or a desired probability of mis-detection, again, depending on 
which optimal criterion is used. 

4. Performance analysis with respect to variables of interest: Besides SNR. other factors 
affect the algorithm performance and merit study. Examples are the size of the target, 
the amount of target motion on the images, and the amount and nature of image 
clutter. In order to study these effects, steps 1 to 3 are repeated for different values of 
variables representing these variations. These results are then summarized in a graph 
where the threshold T determined in step 3 is plotted against the value of the variable 
of interest, as shown in Figure 3.1(d). A fairly flat plot indicates that the effect of the 
variable being considered on the optimal operating point of the algorithm is negligible. 
On the other hand, a steep plot indicates that the variable has a high impact on the 
performance. 

It should be noted that a smaller SNR threshold T implies better performance, since 
weaker targets can be detected with the same given rates of false alarms and mis-deteetions. 
Measuring the performance in terms of the SNR threshold makes it easier to measure and 
compare the performance of different algorithms, or the same algorithm with different pa- 
rameters. This is because the variables, such as the false alarm and mis-detection rates are 
eliminated from the curves, making place for other parameters. 
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3.2 Experimental protocol 

In this section, the experimental protocol used to characterize the performance of the target 
detection algorithms, is described in detail. The protocol consists of the following compo- 
nents, specifying how to 

1. Generate images of simulated targets, 

2. Apply the detection algorithm, 

3. Estimate the rates of false alarms and mis-detections (ROCs) for different sets of 
parameters, and 

4. Characterize the algorithm performance by condensing the ROCs into a performance 
curve. 

3.2.1 Image generation 

In order to characterize the performance of the detection algorithm, it is applied to sequences 
of synthetic images with and without targets. W hile the images with targets are used to 
estimate the mis-detection rate, the images without targets are used to estimate the false 
alarm rate. The images can have the following different types of backgrounds: 

1- Synthetic noise from camera model: The background is assumed to have a constant 
value Abg. The noise is artificially simulated, using the camera noise model. 

2- Real noise from a digital camera: The background images are taken from a sequence 
of images obtained from a digital camera looking at a scene with constant intensity 
such as clear sky, or white paper. 

3. Real background an from analog camera: The background images are obtained using 
a sequence of images with significant clutter. The sequence, which was provided by 
NASA, was captured using an analog camera mounted on a flying aircraft. Figure 3.2 
shows a typical frame of this sequence. 

Generation of image sequences 

To estimate the number of false alarms, the background images themselves, without any 
addition of targets are used directly. The size of these images is N x x N y . For estimation of 
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Figure 3.2: A sample image from the real background sequence provided by NASA. The 
image sequence was taken from an analog camera mounted on an aircraft . 


the rate of mis-detections, simulated targets are inserted in the background images generated 
as described below. For each simulation, a target hie is created having information on the 
position, velocity, size, amplitude and each target to be placed in an image. The image size 
is taken as N x x N y . The number of targets to be inserted in every image is Nf ariJ . The 
target trajectories are generated in such a way that the detection of one target does not 
interfere with the detection of another. This is accomplished by drawing a window around 
each target trajectory. The next generated trajectory is valid only if the window around 
it does not overlap with the windows around the previously generated targets. Otherwise, 
the procedure is repeated by generating another trajectory, until the total number of valid 
trajectories is N targ . 

The velocity (V x , V y ) of the targets is uniformly distributed so that —u ma:r < \ r x < u ma:T 
and -v max < V y < v max . The position of the targets is specified for the last frame i.e. 
when the detection is completed. The position of the target in other frames is given by 
(x — \ x At,y — V y At), where At is the time-interval between the given frame and the last 
frame. 

A target can be a point target, or have a specified height and width. The size of the target 
is given by s x x s y . The target amplitude is given by A. For point targets, the amplitude 
corresponds to the contrast of the pixel it occupies, with respect to the background. However, 
for an extended target, the contrasts of all the occupied pixels are given by the product of 
the target amplitude and the fraction of the area in the respective pixel that is covered by 
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the target. 

Figure 3.3 (a) shows the trajectories of simulated targets to be added to an image, and 
Figure 3.3 (b) shows a zoomed part on a portion of the image. The end of the trajectories 
are marked by blobs. The black box around the target denotes the region where another 
target cannot be present, to reduce the interference between the targets. 

Once the file describing the targets is created, an image sequence of Nf rame frames is 
generated. For each frame, the position of the targets are calculated, and the targets are in- 
serted accordingly. For point targets, the amplitude is added to the background image in the 
target position pixel. For extended targets occupying a number of pixels (fully or partially), 
the product of the amplitude and the fractional occupancy is added to the background image 
at that pixel. 

Addition of noise 

Two types of camera noise [8, 11], the Fixed Pattern Noise (FPN) and the temporal noise 
are added to the sequences created using synthetic backgrounds. FPN has two components, 
additive and multiplicative. The parameters of this noise change from pixel to pixel, but do 
not change with time. The parameter values for each pixel are determined a priori using t he 
camera, and stored as images. On the other hand, the temporal noise is completely random, 
and is generated separately for each frame. The temporal noise approximately follows a 
Gaussian distribution with a variance of: 

® noise ^'0 T ] / 

where / is the expected gray value of the pixel, and w (h uq are the parameters of the particular 
camera. However, since the background amplitude A bg is constant for the experiments with 
simulated noise, and the target amplitude .4 <C A bg , we have I = A + A bg ~ A bg and af loise 
is approximately constant, given by: 


°nmse - + W X A bg 

Hence, the noise can be approximated as Gaussian noise with a constant standard deviation 
of Vnoise- The values of the parameters for the particular camera were estimated [11] as 
w 0 = 0.171 and uq = 0.0056. For background ,4 = 128, this gives a noise = 0.942. The image 
is quantized to give the output in byte format. 
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Figure 3.3: Detection using dynamic programming: (a) Simulated targets trajectories. There 
are 200 targets, and the image size is 960 x 960. The end of the trajectory is marked by 
a blob. The targets are separated so that there the interference between them is reduced. 
The black box around the target denotes the region where another target cannot be present, 
(b) A zoomed part of the target trajectory image, (c) The dynamic programming output 
of a typical experiment (before non-maximal suppression), (d) Zoomed part of the output, 
(e) The dynamic programming output of the same experiment without adding targets - i.e., 
false alarms, (f) Zoomed part of the output. 
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3.2.2 Algorithm application 

The target detection algorithm whose performance is to he characterized is applied to each 
simulated image sequence. In the cases of synthetic images, and digital camera sequences, 
fixed pattern noise (FPN) can he corrected in advance by using pre-computed parameters 
of FPN for each pixel. However, these parameters are perturbed by a random amount 
corresponding to their estimated covariance, to model the error in estimating these val- 
ues. Experiments are performed without and with correction of FPN, and the results are 
compared. 

According to the type of background used, preprocessing in the form of a low-stop filter or 
a morphological filter are performed before applying dynamic programming. After dynamic 
programming is applied, non-maximal suppression is performed to ensure correct counting of 
false alarms and mis-detections. The output (before non-maximal suppression) of a typical 
experiment with 200 targets is shown in Figure 3.3 (c) and (d) where the latter shows a 
zoomed part of the output. 

3.2.3 Estimation of false alarms (FA) and mis-detections (MD) 

The algorithm to be characterized is applied on the image sequences with as well as without 
targets. The sequences without targets are used to estimate the false alarm rate, whereas 
the sequences with targets are used to estimate the mis-detection rate. 

For the false alarm rate, the histogram of the output image is obtained. Using this 
histogram, the false alarm rates for different thresholds can be obtained. For the mis- 
detection rate, only the pixels in a specified window of 5 x 5 pixels around the specified 
target position are checked. For each such window corresponding to a single target, the 
maximum value of the algorithm output is taken. A histogram of these maximum values is 
formed, and processed to obtain the mis-detection rates for different thresholds. The false 
alarm and mis-detection rates are averaged over a number of simulations AV.a and AA/d, 
respectively. 

The number of simulations to test can be specified so that the standard deviation in the 
estimate of the false alarm or mis-detection rate is below a given value. This can be seen 
by observing that the occurrence of an event such as a false alarm or a mis-detection can 
be modeled as a Poisson process and therefore the variance of the total number of events is 
equal to the mean. Thus, if n events are observed, the standard deviation of the absolute 
error in the number of events is 01, and that of the relative error is 1/01. For example, for 
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n = 10 events, the error a is 3.2, or 32 % of the number of events. This error estimate can 
be confirmed by measuring the variance of these rates across the simulations. 

3.2.4 Performance characterization 

Using the estimated false alarm and mis-detection rates, the receiver operating curve (ROC) 
can be plotted showing the rate of mis-detection against the rate of false alarms. The mis- 
detection rate for a specified false alarm rate (F At) is noted from the curve. The simulations 
are repeated for a number of signal amplitudes A. The ratio of this amplitude to noise level 
corresponds to the SNR. The value of the signal amplitude for a specified mis-detection rate 
(MD t ), and the above false alarm rate is obtained. This is considered as the threshold signal 
value ( A t ). The number of simulations used is at least Nfa = 10/ F At in the case of false 
alarms and N M d = 10 /MDt in the case of mis-detections, so that for the rates FA T and 
MD t , an average of at least 10 events would be observed, giving an error a of at most 32 
%. Due to constraints on the execution time, larger number of experiments were not used, 
although they would be desirable for reducing this error. 

Other parameters, such as the size of the target, can be varied one at a time, and the 
variation of At can be plotted against the respective parameter to determine the effect of 
the parameter on the algorithm performance. 

3.3 Results 

The target detection algorithm was tested on 3 categories of images as described in the 
protocol. The results are shown and compared in the following sections. 

3.3.1 Synthetic noise from camera model 

In this case, the noise was synthetically generated using the noise model of the Kodak 
Megaplus ES 1.0 digital camera. Targets of varying size were added for mis-detection anal- 
ysis. Experiments without and with correction of FPN were performed. 

Figure 3.4 (a) and (b) show the plots of the false alarm and mis-detection rates, re- 
spectively, against the threshold value, for experiments without FPN correction. The mis- 
detection rates are shown for a number of signal amplitudes for 1 x 1 targets. The mis- 
detection rate is measured as the ratio of the average number of mis-detections, to the total 
number of targets in a simulation. However, the false alarm rate is measured as the average 
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Table 3.1: Table of parameters used for the experiments with the following image categories: 
(1) Synthetic noise from camera model. (2) Real noise from a digital camera, (3) Real 
background from an analog camera. 


Description 

Parameter 

Category 



(1) 

(2) 

( 3 ) 

Image x size 

-V, 

960 

960 

640 

Image y size 

N 

960 

960 

480 

No. of targets 

AT 

1 ' farg 

200 

200 

50 

Maximum x velocity 

^■hnax 

1 

1 

1 

Maximum y velocity 

1 max 

1 

1 

1 

x size 


0.5 to 2 

2 

2 

y size 

Sy 

0.5 to 2 

2 

2 

Amplitude 

A 

1.0 to 15.0 

1.0 to 6.0 

10.0 to 70.0 

Number of frames 

^ frame 

32 

32 

32 

Background value 

Afrg 

128 

~ 200 

not used 

Noise standard deviation 

“A noise 

0.942 

not used 

not used 

Forgetting factor 

rt 

15/16 

15/16 

15/16 

Number of FA simulations 

Nfa 

500 

1 

1 

Number of MD simulations 

Nmd 

50 

10 

10 

Threshold FA rate 

FA r 

0.02 

10 

10 

Threshold MD rate 

M Dr 

0.001 

0.01 

0.01 
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number of false alarms per simulation, instead of the ratio of the number of false alarms to 
the total number of pixels. This is done to give a better idea of the algorithm performance. 

Figure 3.4 (c) shows the plot of mis-deteetion rate against false alarm rate for different 
amplitude values for 1 x 1 targets. The point of threshold false* alarm rate FAj is set to 0.02 
false alarms per simulation, which corresponds to a total of 10 false alarms for N FA = 500 
simulations. Figure 3.4 (d) shows the plot of mis-deteetion rate against the amplitude values 
for the above rate of false alarms. The A T for the threshold mis-detection rate of M D T is 
interpolated, and marked as a circle. The MD r is set to a probability of 0.001 per target, 
which corresponds to an average of 0.2 mis-detections per simulation for a simulation with 200 
targets, or a total of 10 mis-detections for Nmd = 50 simulations. The corresponding graphs 
for the case where fixed pattern noise compensation was applied are shown in Figure 3.5. 

The above experiments are repeated for other sizes of targets, and the A T calculated from 
these is plotted against the size of the target. Resulting plots for the experiments without 
FPN correction are shown in Figure 3.6 (a) for square targets (size xxx) and in Figure 3.6 (b) 
for rectangular targets (size 1 x x). The corresponding results for the experiments with FPN 
correction are shown in Figure 3.6 (c) and (d). The threshold amplitudes for various sizes 
are tabulated in Table 3.2. It is seen that larger targets require smaller signal amplitudes 
for detection implying better performance. Similarly, the signal amplitudes required when 
FPN correction is applied are much smaller than those when the correction is not applied, 
implying better performance in the former case. 


Table 3.2: Results of dynamic programming algorithm on simulated image sequences wit hout 
and with FPN correction. Threshold amplitudes are shown for false alarm rate of 0.02 per 
simulation and mis-detection rate of 0.001 per target. 


Size 

No FPN correction 

With FPN correction 

1 x 1 

14.85 

4.72 

1 x 1.5 

11.43 

3.48 

1 x 2 

9.38 

3.10 

1.5 x 1.5 

10.49 

2.55 

2x2 

6.35 

2.04 
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False Alarm Rate size: 1x1 



(<■) (d) 


Figure 3.4: Results for camera noise model without FPN correction: (a) Plot of FA rate 
(average number per simulation) against threshold (b) Plot, of MD rate against threshold, 
for a number of signal amplitudes (higher amplitudes towards right.) for lxl targets, (c) Plot 
of MD rate against FA rate (for marked amplitude). The data points are marked as crosses. 
The MD rate when FA rate is FAj = 0.02 per simulation is interpolated, and plotted as 
circle, (d) Plot of MD rate against amplitude for FA rate of FA T = 0.02 per simulation. 
The value amplitude when MD rate is MDf = 0.001 per target is interpolated and marked 
as a circle. 
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(a) (b) 



Figure 3.5: Results for camera noise model with FPN correction: (a) Plot, of FA rate (average 
number per simulation) against threshold (b) Plot of MD rate against threshold, for a number 
of signal amplitudes (higher amplitudes towards right) for lxl targets, (c) Plot of MD rate 
against FA rate (for marked amplitude). The data points are marked as crosses. The MD 
rate when FA rate is F At = 0.02 per simulation is interpolated, and plotted as circle, 
(d) Plot of MD rate against amplitude for FA rate of FA r = 0.02 per simulation. The value 
amplitude when MD rate is MD r = 0.001 per target is interpolated and marked as a circle. 
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size change size change 



Figure 3.6: Performance curves for simulated targets: (a) Plot of amplitude against the 
target size (x x x) for experiments without FPN correction. The data points are marked as 
crosses, (b) Plot of amplitude against the target size (1 x x). (c) and (d) Corresponding 
plots for experiments with FPN correction. 
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3.3.2 Real noise from a digital camera 

In this case, instead of synthetically generating the noise, background images captured using 
the Kodak Megaplus ES 1.0 digital camera looking at the sky were used. Targets of size 
2x2 pixels were synthetically added for the mis-detection analysis. Experiments without 
and with correction of FPN were also performed. 

The false alarm threshold was set FA r = 10 per simulation, resulting in a total of 10 
false alarms for Np& = 1 simulation. The mis-detection threshold was set to MDt — 0.01 
per target, corresponding to 20 mis-detections for N M p = 10 simulations with N targ = 200 
targets. Unfortunately, the performance at lower rates of false alarms and mis-detections 
could not be reliably estimated because of the limited number of background images available. 
However, one can extrapolate the false alarm and mis-detection rates to study the behavior of 
the algorithm for lower rates. Due to the normal distribution of noise, even a small increase 
in the threshold reduces the false alarm and mis-detection rates dramatically. Hence, a 
somewhat higher target amplitude can be expected to reduce these rates to an acceptable 
level. 

In the case of the experiments without FPN correction, the plot of mis-detection rate 
against false alarm rate for different levels of target amplitude is shown in Figure 3.7 (a). 
The plot of mis-detection rate against SNR for false alarm rate of FA r = 10 per simulation 
is shown in Figure 3.7 (b). The corresponding plots for the experiments with FPN correction 
are shown in Figure 3.7 (c) and (d). The target, strength required for detection at the specified 
rates of false alarms and mis-detections are marked by circles in Figures 3.7 (b) and (d). 
It can be seen that the target strength required when FPN is not corrected (At — 3.22) 
is higher than that required when FPN correction is applied (At = 1.86), implying better 
performance in the latter case. 

3.3.3 Real background an from analog camera 

In this case, a real aerial background, obtained from an analog camera used during a flight 
test was employed. Targets of size 2x2 pixels were synthetically added for mis-detection 
analysis. 

In order to suppress the background, low-stop and morphological pre-processing were 
separately applied, and the results compared. Since the background was cluttered, a much 
higher signal was required for satisfactory detection. Even then, the false alarm rate does not 
reduce sufficiently, thus showing that more post-processing would be required after applying 
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Figure 3.7: Results for real noise from camera for 2 x 2 targets: (a) Plot of MD rate against 
FA rate (for marked amplitude) for images without FPN correction. The data points are 
marked as crosses. The MD rate when FA rate is FA T = 10 per simulation is interpolated, 
and plotted as circle, (b) Plot of MD rate against amplitude for FA rate of FA r = 10 
per simulation. The data points are marked as crosses. The value of At where MD rate is 
MD t = 0.01 per target is interpolated and marked as a circle, (e) and (d) Corresponding 
plots for FPN corrected images. 
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the algorithm. However, since the number of false alarms (plus true candidates) would be 
small after this processing, the time complexity of subsequent, algorithms would be reduced 
significantly. The techniques described in Chapter 5 can be used to separate the remaining 
background clutter from the genuine targets. These techniques utilize the difference in the 
image translation and expansion between an object on a collision course, and t ho background 
clutter. 

The false alarm threshold was set FA r = 10 per simulation resulting in a total of 10 false 
alarms for N FA -- 1 simulation. The mis-deteetion threshold was MD r = 0.01 per target, 
corresponding to 10 mis-detections for N MD = 20 simulations with N targ = 50 targets. Again, 
unfortunately, lower rates for false alarm and mis-detection cannot be reliably estimated due 
to the limited number of background images available. 

The results for the morphological filter and the low-stop filter are shown in Figures 3.8 and 3.9, 
respectively. It can be seen that, the target strength required when the morphological filter 
( A t — 17.8) is used is much lower than that required when the low-stop filter (A r = 57.8) 
is used. The morphological filter is thus better, and the reason for this is that the morpho- 
logical filter reduces clutter corresponding to large features, whereas the low-stop filter does 
not do this effectively. However, both result in much poorer performance than that obtained 
with a digital camera with clear background. 

3.3.4 Comparison with other methods 

The performance of the dynamic programming algorithm was also compared with other 
methods such as simple thresholding on a single frame, and temporal averaging on the 
same number of frames. The comparison was made using FPN correction on images with 
simulated camera noise. The results of applying the dynamic programming algorithm, simple 
thresholding on a single frame, and temporal averaging on image sequences with 2x2 
moving targets are shown in are shown in Figures 3.10, 3.11 and 3.12 respectively. Temporal 
averaging was also applied on image sequences with stationary targets instead of moving 
targets, the results of which are shown in Figure 3.13. 

Similar experiments were performed with other target sizes. Table 3.3 shows the compar- 
ison the for these algorithms using various target sizes. The plots of the threshold amplitudes 
against target sizes are shown in Figure 3.14. Again, smaller threshold amplitudes imply 
better performance as explained before. 

It can be seen that the performance of single frame thresholding, as well as temporal 
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Figure 3.8: Results for real cluttered background for 2 x 2 targets using morphological filter 
in the preprocessing: (a) Plot of FA rate (average number per simulation) against threshold 
(b) Plot of MD rate against threshold, for a number of signal amplitudes (higher amplitudes 
towards right), (c) Plot of MD rate against FA rate (for marked amplitude). The data 
points are marked as crosses. The MD rate when FA rate is FA r = 10 per simulation is 
interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
F A t — 10 per simulation. The data points are marked as crosses. The value of At where 
MD rate is MD T = 0.01 per target is interpolated and marked as a circle. 
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False Alarm Rate 


size: 2x2.lstop 



Figure 3.9: Results for real cluttered background for 2x2 targets using low stop filter in 
the preprocessing: (a) Plot of FA rate (average number per simulation) against threshold 
(b) Plot of MD rate against threshold, for a number of signal amplitudes (higher amplitudes 
towards right), (c) Plot of \ID rate against FA rate (for marked amplitude). Tin' data 
points are marked as crosses. The MD rate when FA rate is FA, = 10 per simulation is 
interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
F At = 10 per simulation. The data points are marked as crosses. The value of A-, where 
MD rate is MDt - 0.01 per target is interpolated and marked as a circle. 
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averaging are much poorer than that of the dynamic programming. However, if stationary 
targets are used instead of moving targets, the performance of temporal averaging is slightly 
better than that of dynamic programming, showing that temporal averaging is the best 
choice when the targets are stationary. 

Table 3.3: Results of target detection algorithms on simulated image sequences with FPN 
correction. Threshold amplitudes are shown for false alarm rate of 0.02 per simulation and 
mis-detection rate of 0.001 per target. 


Size 

Dynamic 

prog. 

Single frame 
thresh. 

Temp. Avg. 
(moving) 

Temp. Avg. 
(stat.) 

1 x 1 

4.72 

23.03 

33.82 

4.63 

1.5 x 1.5 

2.55 

10.68 

16.99 

2.11 

2x2 

2.04 

8.17 

11.67 

1.65 
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Figure 3.10: Results for camera noise model with FPN correction for 2x2 targets using dy- 
namic programming: (a) Plot of FA rate (average number per simulation) against threshold 
(b) Plot of MD rate against threshold, for a number of signal amplitudes (higher amplitudes 
towards right), (c) Plot of MD rate against FA rate (for marked amplitude). The data 
points are marked as crosses. The MD rate when FA rate is FA T = 0.02 per simulation is 
interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
F A t - 0.02 per simulation. The value of A r when MD rate is MD r = 0.001 per target is 
interpolated and marked as a circle. 


38 




size: 2x2 



(a) 


size: 2x2 




(b) 
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Figure 3.11: Results for camera noise model with FPN correction for 2 x 2 targets by- 
thresholding a single frame, (a) Plot of FA rate (average number per simulation) against 
threshold (b) Plot of MD rate against threshold, for a number of signal amplitudes (higher 
amplitudes towards right), (c) Plot of MD rate against FA rate (for marked amplitude). The 
data points are marked as crosses. The MD rate when FA rate is FA T = 0.02 per simulation 
is interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
FA t = 0.02 per simulation. The value of A r when MD rate is MD T = 0.001 per target is 
interpolated and marked as a circle. 
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(c) (cl) 


Figure 3.12: Results for camera noise model with FPN correction for 2 x 2 moving targets 
using temporal averaging, (a) Plot of FA rate (average number per simulation) against 
threshold (b) Plot of MD rate against threshold, for a number of signal amplitudes (higher 
amplitudes towards right), (c) Plot of MD rate against FA rate (for marked amplitude). The 
data points are marked as crosses. The' MD rate when FA rate is FA r = 0.02 per simulation 
is interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
FA t = 0.02 per simulation. The value of A r when MD rate is MD r = 0.001 per target is 
interpolated and marked as a circle. 
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Figure 3.13: Results for camera noise model with FPN correction for 2 x 2 stationary targets 
using temporal averaging, (a) Plot of FA rate (average number per simulation) against 
threshold (b) Plot of MD rate against threshold, for a number of signal amplitudes (higher 
amplitudes towards right), (e) Plot of MD rate against FA rate (for marked amplitude). The 
data points are marked as crosses. The MD rate when FA rate is FA T = 0.02 per simulation 
is interpolated, and plotted as circle, (d) Plot of MD rate against amplitude for FA rate of 
F At = 0.02 per simulation. The value of At when MD rate is MDt — 0.001 per target is 
interpolated and marked as a circle. 
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Figure 3.14: Performance comparison of several algorithms: Plot of amplitude against the 
target size ( x x x) for experiments without FPN correction using (a) Dynamic programming, 
(b) Thresholding single frame, (c) Temporal averaging (moving targets), (d) Temporal av- 
eraging (stationary targets). 
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Chapter 4 

Theoretical Performance of Detection 
Algorithms 


In this chapter, the approximate theoretical performance of the algorithms presented in 
Chapter 2 are derived. The theoretical derivations are based on the paper by Tonissen 
and Evans [18]. The theoretical performance is compared with the experimentally observed 
performance described in Chapter 3. Effects of approximations used in the derivations are 
also described. 


4.1 Dynamic programming algorithm 

The dynamic programming algorithm described in Chapter 2 can be summarized as follows: 

1. Initialization: For all pixels (x.y) and all velocities ( u,v ), set 

F(x, y; n, v. 0) = 0 

2. Recursion: At time k, set 

F(x, y; u, v, k) = (1 — a)f(x, y: k) + o max F(x - u — x' , y — v — y'; u, v: k — 1) 


3. Termination: At time K, take 

Fmax{x, y. K) = max F(x,y: u,v:K) 
(u,v)€P 
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The number of elements in sets P and Q are denoted by p and q, respectively. The values 
of p = q = 4 have been used in our implementation, with u,v € { — 1,0} and x',y' € {0, 1}. 
Note that for theoretical analysis, the recursion step is replaced by: 

F(x, y\ u, v; k) = f(x, y; k) + a max F(x — u — x', y — v — y'\u,v\ k — 1) (4.1) 

(x\y')eQ 

However, this only changes F by a scale factor, and since both signal as well as noise would 
be scaled equally, SNR analysis does not change. 

4.2 False alarm and mis-detection probabilities 

Probability of false alarms P/7,4 is the probability that there is at least one state exceeding 
the threshold IV out of p velocity states at the final output time A', for the pixel where there 
is no signal in its neighborhood - i.e., hypothesis H 0 . 

P FA {x,y) = Pr max F(x,y;u,v; K) > V T \H 0 = 1 - [Po,ft (VV)] p (4.2) 

(u,v)€P 

where Po,k(Vt) denotes the probability of F for hypothesis Ho at time K , being less than 
or equal to the threshold V'V- 

Probability of mis-detection P.\jd is the probability that there is no output with correct 
velocity (u,v) exceeding the threshold at time K , within a neighborhood R of size r 4- 1. 
where one cell contains signal i.e., hypothesis Hi - and the other r cells are noise. This 
allows for some tolerance in the location of target. For example, a 5 x 5 neighborhood 
corresponding to r + 1 = 25 gives a tolerance of ±2 pixels in the location of the target . On 
the other hand, a 1 x 1 neighborhood consisting of only the target position corresponds to 
r + 1 = 1 or r = 0 giving no tolerance for the target position. 

Pmd(x, y; U, v ) = Pr max F(x - x' ,y - y'\ u, v, K) < VV|Pi 

_x t ,y , £R 

= Pi,k(V t )[Po,k(Vt)Y (4.3) 

where Pi,k-(VV) denotes the probability of F for hypothesis H\ at time K being less than or 
equal to the threshold VV- 

4.3 Normal approximations 

For an analytic solution of the performance of the dynamic programming algorithm, the 
distributions of the intermediate outputs can be approximated using normal approximations. 
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Table 4.1: Values of //, and a 2 for a number of values of q. 


Q 

Hq 

G q 

1 

0 

1 

4 

1.029 

0.491 

9 

1.485 

0.3574 

25 

1.965 

0.2585 

49 

2.241 

0.2168 


Consider q independent standard normal variables wq ~ iV(0, 1). The cumulative distribution 
function (CDF) of the maximum of these variables is given by: 


= Pr[wi < re] = [<f>(uj)] 9 (4-4) 

l 

where $(•) is the CDF of a standard normal variable. The probability density function 
(PDF) is the derivative of the CDF given by: 


P(w) = Pr 


max in, < w 

l 


p(w) = ^(w)] 9 ‘G(t/;) 


(4.5) 


where G(-) is the standard normal PDF. 

This distribution of maximum of q standard normal variables can be approximated as 
a normal distribution N(fi q ,a q ), where fi q and denote the mean and the variance of the 
actual distribution. These are computed using numerical integration, and are tabulated in 
Table 4.1 for different values of q. 

For general normal variables z f ~ A r (/x, a 2 ), one can substitute: = fx + auq where u?j 

are standard normal variables. The maximum of Zj is approximately normally distributed 
with mean and variance given by: 

F[max Z{\ = ft + rrCftnax »?,;] = // + o/i q 

r[maxzj] = a 2 V' [max »;*•] = a 2 a 2 (4.6) 

Let the input at any time k be normally distributed, both in absence and presence of the 
target, so that: 


f(x,y: k\H 0 ) ~ A’(//„,^) , f(x,y\ k\H]) ~ N{fi s ,a 2 s ) 


(4.7) 
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Then, the distributions of the output F at time k will also be approximately normally dis- 
tributed so that 


F{x, y; u, v; k\H 0 ) ~ N(M U , S% k ) , F(x, y, u, v, k\Hi) ~ JV(M U , S^) (4.8) 

where the M and S parameters are calculated below. 


4.4 False alarm analysis 

For noise pixels, we have: 

F(x, y; u, v\ 0) = 0 

F(x,y;u,v,k) = f(x,y;k) + a max F(x — u — x',y — v — y';u.v\k ~ 1) 

(x',y')€Q 

= N(MomSIP (4.9) 

Using equation (4.6), the mean and variance parameters at time k can be recursively ex- 
pressed as: 


M 0)0 — 0 , Mo,k 

$0,0 = 0 ’ ^O k 


Hn + «(A/o,fc-l + PqSo ik -l) 

_2 , 2 _2 c2 

°n + Q 


(4.10) 


Solving these recursive equations yields expressions for mean and variance at time K: 


c 2 

°0,K 


o~ 


1 - 

1 — a 2 a 2 


l-a K 
1 — a 


K - 1 

p n F Oipq ^ ' (ck So'f( _ i ) 

i=0 


(4.11) 


To get approximate closed-form expressions for M 0 ,^, one can write So,* as: 


5 0 , fc — CTn 


\ 


1 - a 2K a 2K 
1 — a 2 a 2 


F 


a 2 a 2 


(■ 




(4.12) 


where 7* is dependent on k but always lies between 0 and 1. Using 7* = 1/2 is equivalent 
to using the first order term of binomial expansion, whereas 7*, = 0 corresponds to assuming 
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that So,* remains approximately constant with k, which is justifiable, since er 2 is quite small. 
Accordingly, we have: 


1 _ a K K- 1 

Mo,K = — — Pn + a Po Y (a k So.K-k-i) 

k = 0 

afi q a n r 1 - 

Pn + 


l—o 
1 — a h 


1 


o 


A 3 


a 2 a‘ 2 q 


1 


a 


7 a 


1 - a(T q 


(4.13) 


where 7 is a function of all 7 * and also lies between 0 and 1 . Values of 7 - 0 and 7 = 1/2 can 
be used as the zero order and first order approximations, respectively. For K — > 00 , a ^ 1 
such that a h « 1 (also, a 2h <C 1 ), we have: 


9 2 

°0,A 


o: 


o: 2 <t 2 




0,K 


a 


Pn + 


(XlJtqOn 


(4.14) 


For the case when a = 1, the sum changes from (1 — a K )/(l - a) to K. Hence, the 

expressions become: 


S ' 2 

°0,K 

Mq,k 


= o~ 


1 - of 


— A/x„ + 


flq(7n 


A 


K - 


1 - KV 


— a 


q L 


1 


ot 


Finally, the probability of false alarms is: 


(4.15) 


Pfa = 1 - [n./v (Vr)] p 


(4.16) 


giving 

Po.k(\t) = (1 - Pfa)' Ip = $ ( VT ~ M °- y > ) (4.17) 

where $(•) denotes the CDF of a standard normal variable. Hence, the threshold Vt can be 
expressed in terms of the mean Mo,k, variance So.k, and the false alarm probability Pfa as: 

\r = Mo , a + Som® '[(1 - Pfa) 1 ' p ] — 'Ho , a + 5o,a0o, p (4-18) 

where 

<t>o, P = $7(1 - PfaV / p ] ^ [1 - Pfa/p] (4.19) 
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4.5 Missed detection analysis 

The probability of mis-detection is given by: 


Pmd = Pi,k(^)[Po,k(\t)Y < P\,k{Vt) 

Substituting the expression of Vp in terms of false alarm rate, we have: 

Pmd = (1 — PfaY' p P\,k (V t) 


(4.20) 


(4.21) 


giving 


Hence, 


DAVt) = .. p r._„. = •» fVr “ m, - k 


(1 - p FA y/r \ s lt 


K 


Vp — Mi,k + 5’i.k ( 1> 




M D 


(1 - P FA ) Tlp 


— M\,K — Si y K 01 , p 


(4.22) 


(4.23) 


w 


diere 


01 ,p = 


-1 


MD 


[(i - p FA ) r/ y 


= 


p> 


MD 


(i - p FA y/p j 


~ $- x [i - p, 


MD 


(4.24) 


since usually, P FA <C 1. 

Approximations of M\,k and Sf K . are obtained considering the exceeding of threshold 
only due to the signal part, and not due to the noise part. Also, it is assumed that the target 
occupies a single pixel. In such a case, we have: 


F(x, y ; u , v: k ) ~ f(x, y\ k) + aF{x. y\ u, v ; k- 1 ) ~ N{M hk , 5 ? fc ) 


(4.25) 


It can be easily shown that: 


Mi 


1 — a 


K 


1 — Q 


2K 


K 


1 1 ® s 
1 — a 1 — a z 


(4.26) 


4.6 Calculation of required SNR 

To calculate the SNR required for detection at particular rates of false alarms and mis- 
detections, equations (4.18) and (4.23) are combined to give: 

Mi,k ~ A/ 0 ,a- = Sq.k 0o ,p + S\,k 0i iP (4-27) 


48 



^n — ^s — c, equation (4.27) becomes: 
1 - oc K 


1 — Q 


// - 


aan q 


1 - a 2 a 2 q 


— (T, 


Q 

The SNR required for detection is given by: 


and ib 

h ,ft'; 

and 

‘1 -a K 

^ 

n K - 

1 — a 



a 2K a 2l< 



-a4 

00, p 

+a \ 


= //, and 


1 - oV 


ft' 2ft 1 


1 - j 


ll - n 2K 
1 — a 2 


0i,p 




Ofl q 


SNR t = -~ 

<* . /I - d 2 a 2 


1 — yrr 


ft--! 1 1 -aV 1 


1 — 


1 - j 


+ 


1 — O 


l — o ' 2h a 2h l — a 1 + a K 

0o, p + \/ , , . ' jy 0i , p 


1 — o h \ 1 — a' 2 a 2 


l+o 1 — o' 


For o=l, replacing (1 — o A )/(l — o) by K, we get 


SNR t = h -~ _ 

A 


t l Q 


— Ot 


V 1 - o 2K 
i !_ . 9 

A l—o? 


+ 


A'\| 1- 


1 ~ , 1 

00, p d 7^ 01 ,p 




v/A 7 


For K — > oo, o ^ 1 such that o ,x -C 1 
SNRt = — ~ ^ 


l—o 


'l — o 


I o — 9 d 7= = 00, p + +\/ 

1 — o a q y/l — aSa 2 V 1 + Q 

The above expressions of SNRt can be written in the form: 

S A R't = A. -\- D 0o, p + C 01, p 


01 ,p 


(4.28) 


(4.29) 


(4.30) 


(4.31) 


(4.32) 


where A, B, and C depend on A , q, and o. The terms B and C decrease with K, improving 
the algorithm performance as K increases. However, the term A increases with K, putting 
a lower bound on the required SNR, thus limiting the performance. It can be shown that 
this bound increases with q , and hence a lowest possible value of q should be used. This is 
intuitively explained, since a maximum is taken over q noise pixels and it is more likely to 
be a false alarm when q is large. 


4.7 Temporal averaging and single frame thresholding 
as special cases 

Recursive temporal averaging algorithm can be considered as a special case of dynamic 
programming with p = q = 1, for which p q = 0 and < 7 ^ = 1. Hence, the threshold SNR for 
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recursive temporal averaging becomes: 


SNR t = '-~ [0|) + ,] (4.33) 

(T ]j 1 + a l - a h 

This expression can also be obtained by using the recursive temporal averaging equations: 
F{x, y; 0) = 0 , F{x, y: k) = f{x, y\ k) + nF{x, y\k- 1) (4.34) 


Also, for a = 1, this expression takes the limit: 


SNR t - 


1 

71 


[ 00,1 + 01 , 1 . 


(4.35) 


The same result would be obtained by using a = 1 in original equations. For K — » oc, a ^ 1 
such that a K < 1, 

SNR t = Jt-j— [<?o,i + 4>\,\] (4.36) 

V i + « 

For single frame thresholding (A' = 1 or o = 0), the threshold SNR reduces to 4>o,] + 

Note that the first term from the dynamic programming algorithm disappears in these 
expressions, and there is no lower limit to the performance if a = 1. 


4.8 Theoretical performance plots 

This section describes the behavior of the required signal to noise ratio SNRt for different 
values of parameters. It should be noted that lower required SNR means better performance. 
Figure 4.1 (a) shows plots of SNRt against. I\ for dynamic programming algorithm with p = 
q = 4 and a number of values of a. The false alarm rate is 2 x 10~ 8 (0.02 per simulation for a 
1 mega-pixel image), and the mis-detection rate is 0.001. It can be seen that SNRt decreases 
with increase in K, but saturates at a certain point depending on a. Figure 4.1 (b) shows the 
corresponding plot for p = q = 1 i.e., recursive temporal averaging. Figures 4.1 (c) and (d) 

show the plots of SNRt against K with a = 1 and a — 15/16, respectively, for a number of 
values of p and q. It is observed that SNRt increases with q as expected. The SNRt also 
increases slightly with p, but the plots cannot show the change. Except in the case of a = 1 
and p = q = 1 - i.e., temporal averaging - the SNRt saturates at some minimum value as 
K — > oo. 
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Figure 4.1: Plots of SNR.r against K for: (a) p = q = 4 (dynamic programming) and 
number of a values, (b) p = q = 1 (temporal averaging) and number of a values, (c) o: = 1 
and number of p and q values, (d) a = 15/16 and number of p and q values. The parameters 
used are: FA — 2 x 10 -8 , MD = 0.001. 
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Table 4.2: Parameters used for calculating the theoretical performance of algorithms. 


Parameter 

Dynamic prog 

Single frame 

Temp. Avg. (stat) 

FA 

2x10 8 /pixel = 0.02 /image 

MD 

O.DQlj pixel 

a 

15/16 

K 

32 

1 

32 

Q 

4 

— 

1 


Table 4.3: Comparison of theoretical performance of the algorithms with observed perfor- 
mance on 2 x 2 targets. 


Algorithm 

Theoretical SNR 

Observed SNR 

Dynamic Prog 

2.4540 

2.04 

Single frame 

8.5811 

8.17 

Temp. Avg. (stat.) 

1.7507 

1.65 


4.9 Comparison between theoretical and observed per- 
formance 

The parameters used in the calculation of theoretical performance of the algorit hms for 2x2 
targets are shown in Table 4.2. The calculated and the observed SNR threshold for these 
parameters for various algorithms are shown in Table 4.3. 

One can observe that the actual performance of the algorithm for 2 x 2 targets is slightly 
better than the theoretical performance for most of the algorithms. The reason for this is, 
that a 2 x 2 target occupies at least one pixel completely, and a few other pixels partially. 
Hence, its performance should be slightly greater than the calculated performance in which 
one assumes that the target occupies exactly one pixel. 

To correct this problem, point targets were used in place of 2 x 2 targets. The experiments 
in Chapter 3 were repeated using point targets. The comparison between the calculated and 
observed SNR for a number of false alarm and mis-detection rates are shown in Table 4.4. 
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Table 4.4: Comparison of theoretical performance of the algorithms with observed perfor- 
mance on point targets for a number of different values of false alarm (FA) and mis-detection 
(MD) rates. 


Algorithm 

FA rate 

MD rate 

Theo. SNR 

Obs. SNR 

Dynamic Prog. 

2 x 10 _8 =0.02/simul 

0.001 

2.4540 

2.7172 

Dynamic Prog. 

10~ 6 =l/simul 

0.01 

2.2313 

2.2928 

Dynamic Prog. 

10 -6 =l/simul 

0.1 

2.0181 

1.9862 

Dynamic Prog. 

10 _4 =100/simul 

0.1 

1.9259 

1.8401 

Temp. Avg. 

2 x 10 -8 — 0.02/simul 

0.001 

1.7507 

1.7355 

Temp Avg. 

10 _6 =l/simul 

0.01 

1.4444 

1.4307 

Temp Avg. 

10 _6 =l/simul 

0.1 

1.2313 

1.2345 


It can be seen that the calculated and observed SNR rates agree very well in most cases. 
However, in the case of extremely low false alarm and mis-detection rates, the observed SNR 
is greater than the calculated SNR for the dynamic programming algorithm. The reason for 
this is the normal approximation used for the distribution of resulting output . 

4.10 Effect of approximations 

Approximations were used to derive the closed form expressions. In this section, the effects 
of these approximations are described. 

Normal approximation 

Normal approximation was used for maximum of q normal variables. The comparison of 
the probability density, and the complementary cumulative distribution functions of the 
maximum of q = 4 standard normal variables, and their normal approximation are shown in 
Figure 4.2. It can be seen that the approximation is good in the interior, where probability 
density is high, but is inaccurate in the tails, where the probability density is low. 

Due to the difference in these distributions, the probability of false alarms is underesti- 
mated. In fact, to get the actual value of the false alarm rate, the function corresponding 
to the actual cumulative distribution of the output F should be used in place of cumulative 
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(b) 


Figure 4.2: Probability distributions of normal approximations: (a) Probability density 
function ( p(x ) and p„(x)) (b) Complementary cumulative distribution (Q(x) and Qn(' r )) 
against x of the maximum of q — 4 standard normal variables (solid line) and the normal 
approximation having same mean and variance. 


normal distribution. But this distribution is difficult to obtain in closed form. 

To get an idea of the difference between the actual distribution arid the normal approxi- 
mation, consider the function corresponding to the complementary cumulative distribution 
Q(x ) = Pr[X > x] of the maximum of q = 4 normal variables as shown in Figure 4.2. 
For Q(x) = 10“ 8 , we get x = 5.85, whereas for normal distribution the corresponding 
Q n (x) = 10 -8 gives x = 4.95. The difference is around 18 % but is smaller for smaller values 
of x. 

At each step of the recursion, maximum of q instances of F at time k — 1 are taken and 
added to the input / at time k, to obtain the output F at time k. Hence, the distribution 
of the output F at each time should be a better approximation of normal distribution than 
Q(-), since a normal variable (/) is added to the maximum term for obtaining the output F. 
Also, since the normal approximation for F is good in the interior, the mean and variance 
of maximum of q instances of F will lie close to what is computed assuming the normal 
distribution. Hence, the mean and variance calculations are not affected much. 

Furthermore, it is observed that the threshold SNR changes are small even for large 
changes in false alarm and mis-detection rates. In any case, one would not directly use the 
false alarm and mis-detection rates during the application of the algorithm, but estimate 
these dynamically using the output, from the algorithm. 
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Approximation in false alarm estimation 

Another approximation was performed while computing the mean value M (lK of the noise 
output, used in false alarm estimation. For equation (4.12), 7 actually depends on k, which 
makes it impossible to get an exact analytical expression. It was assumed that 7 is fixed 
and approximately equal to 1/2, corresponding to a first order approximation. However, it 
is observed that the value of A/o.a' does not change much with 7 even for the extremes of 
7 = 0 or 7 = 1 . Hence, the approximation is reliable. 

Approximation in mis-detection estimation 

In the case of mis-detections, the output of the algorithm at a target point is assumed to 
be solely due to the target, without the effect of noise. The noise can add or subtract the 
target intensity. However, since maximum is taken over q pixels at every stage, bias is likely 
towards adding. Hence, the mis-detections are likely to be less than what are estimated. 


55 



Chapter 5 

A Special Approach for Hazard 
Detection 


It is well known in the pilots’ community, that an object on a collision or near-collision course 
remains stationary or nearly stationary in its 2-D image view [14]. The closest distance that 
an aircraft would approach another before moving away from it, is known as the distance 
of passage, and the time to reach that point is known as the time to passage, or time to 
‘collision’. For ensuring safety, the distance of passage should be larger than a certain limit; 
and objects with a smaller distance of passage should be detected before the time to collision 
becomes too small. It can be shown that the rate of translation of the object in the image 
is proportional to the distance of passage. Using this property, the rate of image translation 
can be used to separate hazardous objects from clutter, since the former have a smaller rate 
of translation. 

Another useful property which can be used to discriminate hazardous objects from clutter 
is the rate of image expansion, which is approximately inversely proportional to the time 
to collision of the object. Nelson and Aloimonos [15] use the image expansion in terms of 
the flow field divergence to estimate the time to collision, for separating obstacles. Francois 
and Bouthemy [7] separate the image motion into components of divergence, rotation, and 
deformation. Ancona and Poggio [1] use 1-D correlation to estimate optical flow for a time- 
to-crash detector. Baram and Barniv [3] rely on object texture to extract information on 
local expansion. Instead of estimating a numerical depth value, an object is classified as 
‘safe’ or ‘dangerous’ using a pattern recognition approach. 

Most of these methods are useful for objects of larger sizes. However, in this case, the 
object sizes can be very small, even sub-pixel, along with very small rates of expansion. 
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Hence, a feature based approach was used in this work, where features were tracked, and 
their expansion estimated over a large number of frames. 

This chapter describes the conditions under which the rates of image translation and 
expansion can be used to separate an object on collision course from the ground clutter. 
Methods to estimate the image translation and expansion are proposed and tested on real 
image sequences obtained from a camera mounted on an aircraft. 


5.1 Scene geometry 

Consider an object approaching towards the aircraft with a relative velocity of V as shown 
in Figure 5.1 (a). Let p be the distance of passage i.e., the closest distance that the object 
approaches the camera - and <f> be the angle between the line of sight of the target and the 
relative velocity vector V'. Let r denote the time to passage (or collision) which is the time 
the object takes to reach the distance of passage. The object distance is r, whereas distance 
that the object travels until it reaches the point of passage P is 2 . 

5.2 Detection using translation 

As the object moves, the angle 0 as well as distances r and 2 change, but the distance of 
passage p is constant. The rate of angular translation of an object in the image is T = <£. 
The pixel translation is approximately given by multiplying the angular translation by the 
focal length. By geometry of Figure 5.1 (a), we have: 

z — p cot. (j) (5.1) 

To find the rate of translation <£, this expression is differentiated to get: 

z = -p( esc 2 0)0 (5.2) 

The magnitude of the relative velocity V is the rate of decrease of 2 , given by: 

V = -2 = p{r/p) 2 4> (5.3) 

Also, the time of passage is given bv: 

t = z/V = r COS0/V' (5.4) 
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(b) 

Figure 5.1: Geometry of (a) target (b) background moving relative to the camera. 
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(5.5) 


From equations (5.3) and (5.4), the rate of target translation is given by: 


T = 0 = 


pV p cos <j> 


Thus, the rate of image translation is proportional to the distance of passage, and the 
objects on a collision course are likely to have a smaller rate of translation compared to other 
objects. However, this rate is also dependent on the target distance, and a nearer target 
moves faster in the image than a farther target with the same distance of passage. If S min is 
the smallest visible dimension that an object can have, the corresponding size in the image 
is given by: 

S min — S m j n /v (5.G) 

Hence, from equation (5.5), one can write: 

T p cos 0 p 

~ < < ~4— (5.7) 

$ • min T ^ min 

Hence, an object on a near collision course, having sufficient time before imminent collision 
has the ratio of its image motion to its image size bounded by the above pro-computable 
limit. For example, if the distance of passage of p = 150///. (500 ft) is allowed, and an object 
of smallest size of S min = 1.2 m (4 ft) is to be detected before r = 25 seconds (750 frames), 
then this ratio becomes 1/6 i.e., the image motion per frame is at the most 1/6"' of the 

image size of the object. However, in actual practice, a larger range of velocities should be 
checked, to have a safety margin. 

It should be noted that the above relationship is valid only if the aircraft does not rotate 
or vibrate around its own axes. If there is rotation, it should be compensated bv using the 
data from the aircraft navigation system. In the absence of this data., it may be possible to 
use image features due to clutter (if available) to perform the compensation, by modeling 
their image motion due to camera rotation. 

If this compensation is successful, the velocity to size ratio of the object would be 
bounded. By reducing the image resolution to an appropriate level, the image velocity 
of the object would also be restricted. Hence, using pyramid construction, target detection 
can be performed at a number of resolutions, and the suitable resolution selected. This also 
leads to spatio-temporal integration of the image data and the amplification of SNR which 
could enable detection of sub-pixel or low-contrast objects in uniform background, such as 
clear or overcast sky. 

The relationship between image motion and the distance of passage can be used to remove 
the clutter which is not on collision course and thus expected to have a large image motion. 
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However, the image motion is inversely proportional to the distance of the object from the 
camera. Thus, if clutter is at a large distance, it too could have a small image motion. The 
conditions under which an object on the collision course can be distinguished from ground 
clutter at the same image position are derived below. 

Let ro and p 0 denote the background distance, and the minimum distance of approach for 
the background, respectively, as shown in Figure 5.1 (b). The relative velocity Vo between the 
camera and the background is actually the magnitude of the camera velocity, By substituting 
these parameters in equation (5.5), the rate of background translation can be written as: 


T 0 = 


P (do 


(5.8) 


Let h 0 = h c — hb denote the difference between the camera altitude h c and the background 
altitude hf,. Also, the angle of the camera velocity above the horizontal (not horizon) is o. 
From Figure 5.1 (b), we have: 


r 0 = d see 6 (5.9) 

Po = r 0 sin(/9 + «) (5.10) 


Here, d is a function of the relative height ho and the angle 9. If the earth were flat (or 9 is 
large), refraction of light is negligible, and the terrain is smooth, the dotted line corresponding 
to d would coincide with the surface of the earth, and we would have 


d = hn cot 9 


However, if we express: 

d(h o ,0) = h 0 cot9f(ho,9) (5.11) 

then the effects of the earth’s curvature and refraction of light ray would be incorporated 
in the function /. If these factors can be neglected, then f{ho,9) ~ 1. The expression for 
/ using the curvature of the earth is derived in Section 5.4. Also, using equation (5.9), one 
can write: 

r 0 = h 0 esc Of {h 0 , 9) (5.12) 

Substituting equations (5.10) and (5.12) in (5.8), the rate of background translation T 0 is 
given by: 


To 


Vo sin(0 + a) 1 0 sin(0 + a) sin 9 


(5.13) 


r 0 hof {ho, 9) 

If the hazard is to be discriminated from the background in the same line of sight, the 
rate of translation of the hazard must be much smaller than that of the background i.e., 
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T < rp % with t)i > 1, having a larger value for greater discriminating power. Using 
equations (5.5) and (5.13), we have: 


poos 0 ^ V'o sin(0 + a) sin 9 
rr ~ 1h h 0 f(h lh 8) 

Hence, the object distance r should be larger than the following expression: 

ntphof{h 0 , 9) cos 0 rj t pDf(h 0 , 8) yU - Q' 2 


(5.14) 


r > 


rV o sin (8 + a) sin 9 


sin (8 + o) sin 9 


with 


D 


= —rr , Q = - = sin 0 , cos 0 = Jl — Q 2 ~ 1 (for »«r) 

Tin V s 


Hence, 9 should satisfy: 


sin(0 + o) sin 8 > r h DQsJl - Q\f(h, h 8) 


(5.15) 


(5.16) 


(5.17) 


Also, using T < rj t T 0 , with equations (5.5) and (5.13), one can write: 

g^<„-, V’,si"(9 + a ) (518) 

rr r 0 

Since the object distance cannot be greater than the background distance in the line of sight, 
r < tq. Hence, one can also write: 


• fo , x . 'lip cos (!) r 0 thp\/1 - Q 2 
sin(w + aj > rr > 


(5.19) 


t Vo r r \ o 

For p < r or Q < 1, this condition is approximately independent of r. It can be said that 
for detection to be possible at all for a particular 8 and o, the above condition is necessary 
irrespective of the target distance r, provided it. is sufficiently large. 

If the curvature of the earth and the refraction of light can be neglected, then / ~ 1. The 
necessary condition in equation (5.19) does not simplify. However, equation (5.17) reduces 
to: 

sin(0 + O') sin 8 > //, DQ\J l — Q 2 (5.20) 

On solving for 9, this yields: 


8 > - 
- 2 


cos 


-2 ntDQy/l - Q 2 + cos o) — o 


If o = 0, the solution for 8 is simpler: 


9 > sin 1 yypDQyJ 1 - Q 2 


(5.21) 


(5.22) 
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For example, if we have: 

p = 150 m, r = 25 s, V 0 = 150m/.s, /? 0 = 1 km, cv = 0, // ( - 2.5 (5.23) 

For these values D = 0.267, and from equation (5.19) the necessary condition is 0 > 5.7°. 
This condition corresponds to the target being at the same position as the background, 
which is r = ro = 10 km ~ 5.4 nrni or Q — 0.015, using equation (5.12). However, if the 
target is nearer, the condition on 0 is determined by equation (5.17) or (5.20). For example, 
if a hazard should be detected at r - 5 km — 2.7 nmi or Q = 0.03, one would really need 
6 > 8.1°. The required 6 increases as r decreases. 


5.3 Detection using expansion 


Another discriminating feature between objects on collision course, and objects much farther, 
is the time to collision. It is well known that the rate of image expansion, i.e., the increase 
of the image size of an object is inversely proportional to the time to collision. 

In Figure 5.1 (a), as the object comes closer to the camera along the line of z, its size in 
the image will become larger. The rate of this expansion of any object is defined as the ratio 
of the rate of increase in its size to the size at that time, i.e., E = s/s - where s is the 
size of the object in the image. Since s = S/r where S is the object size which is assumed 
constant, we have s = —Sr/r 2 , and 

E = —r/r (5-24) 


By geometry of Figure 5.1 (a), 


r 2 = 


f 2 + P 2 


To find the rate of expansion, this expression is differentiated to yield: 


(5.25) 


2 rr = 2zz = -2 zV (5.26) 

Hence, rate of target expansion is given bv: 

r zY T cos (p cos 2 o 

E= — = — = = o.27 

r r l r t 

where the time to passage is: 

t = z/V - r cos o/V (5.28) 

For t = 25 s = 750 frames, the ratio is 0.13 % per frame, which is a very small magnitude. 
This small expansion can be measured by tracking it over a large number of frames. 
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For estimating the rate of expansion of the background, the corresponding parameters 
for the background are substituted in equation (5.27) to give: 


E o = 




7 


.2 

0 


(5.29) 


Using z 0 — r 0 c os(0 + a) with equations (5.9) and (5.11), the rate of background expansion 
can be written as: 


V’o eos(0 + o) 1 o eos(0 + o) cos 0 U 0 cos(0 + O') sin 0 

E ° = 3 = i 771 rn (5.30) 

r 0 d hof(h 0 , 0) 

If reliable discrimination of the hazard from the background in the same line of sight 
is required, the rate of expansion of the hazard must be much larger than that of the 
background, - i.e.. E > r) e Eo with r/ P > 1. having a large value for greater discriminating 
power. Using equations (5.27) and (5.30), one needs: 


cos 2 4> 

> >h 

T 


I ’o cos ((9 + rv ) sin 0 

hJIhJ) 


(5.31) 


° r 

cos(0 + a) sinO < ^ 0 -^°’^ ca ^ - Q 2 )f(ho.0) (5.32) 

0e T > o 

w^here D and Q are given by equation (5.16). For the case of / ~ 1. the equation (5.32) 
reduces to: 

cos(# + ck) sin 0 < i}~ ] D( 1 — Q 2 ) (5.33) 

Explicit solution for 0 is then given by: 

0 < ^ [sin -1 (2r)~ l D(l - Q 2 ) + sin a) — aj (5.34) 

For the conditions in equation (5.23), we need 0 < 6.2° for reliable detection using expansion. 

It should be noted that, the expansion in image size can also be caused by the rotation of 
the target aircraft in a way which would expose a larger area to the camera. However, this 
false expansion takes place only in the direction perpendicular to the axis of rotation of the 
target aircraft, whereas the expansion due to a potential collision would take place uniformly 
in all directions. Also, the target expansion will cease after the aircraft fully rotates to a 
position where maximum area is exposed to the camera. It may be possible to use these 
properties to discriminate between the false expansion and the expansion due to a collision 
course. 
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( 0 , R + h 0 ) 



Figure 5.2: Geometry of earth’s curvature: The coordinates used are with respect to earth’s 
center. 


5.4 Effect of horizon 


In this section, function describing the effect of the curvature of the earth is calculated, 
neglecting the effects of refraction. Figure 5.2 shows the geometry of the earth’s curvature. 
The coordinates used are with respect to earth’s center. Using this, we have: 

d = ftsiny, dtanfl = h 0 + R( 1 — cosy) ~ h 0 + d 2 /(2R) (5.35) 


where R = R 0 + hb, hb is the altitude of the background, R 0 is the radius of earth, and 7 is 
the angle subtended on the center of the earth by the triangle. Solving this equation yields: 


d= R 


tan 9 ± \J tan 2 0 — 2/i 0 //? 


(5.36) 


The correct solution is the smaller value of d, since the larger value represents the other 
intersection of the line of sight with the earth. 

2h 0 


d, = R 


tan 9 — \J tan'- 0 — 2/i 0 / R 


tan 6 + ^tan 2 9 — 2ho/R 


(5.37) 
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By substituting in equation (5.11), we have: 


2 


f{ho,0) = 


1 + yjl - 2/?. 0 /(/?tan 2 9) 


(5.38) 


If 9 ~ 7r/2, or R is large, /? is small, then / ~ 1. i.e., the earth’s curvature can be neglected. 

However, where the line of sight just touches the earth i.e., at the horizon the discriminant 
under the square root is zero, then / = 2 and the corresponding 9 is: 


0h = tan 


vMo//? 


(5.39) 


Any value of 9 smaller than this value corresponds to the line of sight not touching the earth 
i.e., background above the horizon. 


5.5 Behavior of translation and expansion 

Figure 5.3 shows the variation of the required 9 with the horizontal, for the possibility of 
detection using translation and expansion, against, various parameters. Effect of horizon 
was neglected since it was observed that it does not affect the plots to a significant extent. 
The minimum 9 for detection using translation, which is shown by dashed line, whereas the 
maximum 9 for detection using expansion is shown by dotted line. However, the minimum 
9 criterion is only the necessary criterion. For actual discrimination using translation for an 
object at a given distance, a larger 0 is required. The other curves show the required 9 for 
detection using translation for various object distances in meters, and are enveloped by the 
dashed line curve. 

Most of the information in these curves can be condensed using the parameter D = 
h 0 /{TV 0 ). Figure 5.4 (a) shows the contours of same D for different values of Vq and ho for 
t = 25 s. Plots of required 9 for translation and expansion using a number of values of the 
target distance r in km, for the distance of passage p = 150 m are shown in Figure 5.4 (b). 
However, the necessary criterion for translation cannot be expressed using these plots. 


5.6 Estimation of translation and expansion 

To reduce the computational complexity of estimating the translation and expansion, a 
feature-based approach was used. A morphological filter [6] which subtracts the opening 
and closing of the image from the original image was used to detect positive and negative 
features, corresponding to light and dark objects, respectively. 
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(c) (d) 

Figure 5.3: Variation of the required 0 with the horizontal, for the possibility of detection 
using translation and expansion, against a number of parameters: (a) Camera velocity: 1 o- 
(b) Relative height between camera and background: ho, (c) Distance of passage: p (d) Time 
of passage (or collision): r. Default values of the parameters (except when they vary) are: 
Vo = 1 km/s, h 0 = 1 km, p = 150 m, and r = 25 s, and r] t = rj e = 2.5. The minimum 0 for 
detection using translation is shown by dashed line, whereas the maximum 0 for detection 
using expansion is shown by dotted line. The other curves show the required 0 for translation 
for various object distances in meters. 
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(a) (b) 


Figure 5.4: Plots for detection using translation and expansion: (a) Plot showing the contours 
of same D for different values of Vo and ho for r = 25 s (b) Plots of 6 required for detection 
using translation are shown with various symbols for a number of values of the target distance 
r in km, for the distance of passage p = 150 m. Plot of the 9 required for detection using 
expansion is shown with a dashed line. 


To estimate the translation of the features over a number of frames, they were tracked 
over a number of frames. In case of navigat ion system data being available, the posit ion of the 
features were compensated before performing the tracking. A nearest neighbor approach was 
used to determine the corresponding feature in the next frame, and the smoothed estimates of 
the feature position and velocity in each frame were obtained using Kalman filter approach. 
This procedure is similar to the one described in Chapter 7 used for detecting targets crossing 
the aircraft. 

For detecting expansion, a 15 x 15 window around each feature was explored. The 
sub-image corresponding to the window was thresholded, and the connected component 
containing the center of the window was found. All the pixels in the sub-image that did 
not belong to the component were set to zero. To estimate the size of the component, the 
sub-image was convolved with a number of smoothing masks. These masks perform matched 
filtering with a object templates corresponding a number of different sizes. The maximum 
output from all these masks was considered as the measure of target strength. The rate of 
expansion was measured in terms of increase of the target strength, tracked over a number of 
frames. The target strength was plotted against the frame number, and the rate of expansion 
was estimated by applying least squares to the logarithm of the target strength. 
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5.7 Results 


The estimation of translation and expansion was performed on a sequence of images captured 
from an analog camera in which the target aircraft is approaching the camera. Figure 5.5 (a) 
show's a typical frame from the sequence. Figure 5.5 (b) and (c) show the target track in 
the original and the motion compensated images, respectively. Figure 5.5 (d) shows the plot 
of the estimated target size against the frame number. Corresponding plots for two clutter 
tracks are shown in Figures 5.6 and 5.7. It can be seen that the target expansion is the 
large for the target track, and small for the clutter tracks. On the other hand, the rate of 
target translation is small for the target track and large for the clutter tracks. Figure 5.8 
shows the significant tracks before and after motion compensation. A scatter plot of the 
feature expansion against translation for these tracks, including the target track is shown in 
Figure 5.9. The rate of translation is measured in terms of the displacement magnitude of 
the compensated features in 100 frames, whereas the rate of expansion is measured in terms 
of the increase in the logarithm (to base 10) of the target strength in 100 frames. It is seen 
that the target has a large rate of expansion and a small rate of translation and is located 
in the upper left corner of the scatter plot. 
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(<•) (d) 

Figure 5.5: Translation and expansion for target track: (a) Sample image from the last 
frame, (b) Target track (c) Target track after compensation. Rate of translation is small 
for target track, (d) Plot of expansion against frame number. Rate of expansion is large for 
target track. 
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(c) (d) 

Figure 5.6: Translation and expansion for clutter track: (a) Sample image from the last, 
frame, (b) Target track (c) Target track after compensation. Rate of translation is large for 
clutter track, (d) Plot of expansion against frame number. Rate of expansion is small for 
clutter track. 
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(c) (d) 

Figure 5.7: Translation and expansion for another clutter track: (a) Sample image from the 
last frame, (b) Target track (c) Target track after compensation. Rate of translation is large 
for clutter track, (d) Plot of expansion against frame number. Rate of expansion is small 
for this clutter track. 


71 







translation score 

Figure 5.9: Scatter plot of the feature expansion against translation: The rate of translat ion 
is measured in terms of the displacement magnitude of the compensated features in 100 
frames, whereas the rate of expansion is measured in terms of the increase in the logarithm 
(to base 10) of the target strength in 100 frames. The target is marked as an encircled 
asterisk, and is in upper left corner, having a small rate of translation and a large rate of 
expansion. 
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Chapter 6 

Algorithm Fusion 


Each of the target detection algorithms has its own advantages and limitations. Hence, 
a combination of these algorithms may be used in the ultimate design to overcome their 
individual limitations while maximizing their advantages. This chapter describes a method 
for combining the algorithms using statistical approach to optimize the performance in terms 
of the mis-detection and false alarm rates. In particular, the pre-processing algorithms of 
low-stop and morphological filters, described in Chapter 2 are combined. The performance 
of the fused algorithm is compared with the original algorithms using the methodology 
described in Chapter 3. 


6.1 Combination of algorithms using a statistical ap- 
proach 


According to the Neyman Pearson criterion, the optimal Bayesian detector which minimizes 
the rate of mis-detection for a particular rate of false alarms is obtained bv thresholding the 
joint likelihood ratio of the individual detector outputs, or some monotonic function of the 
same. The threshold should be such that the desired false alarm rate is obtained. 

Consider the joint likelihood ratio of the low-stop and the morphological filter. Let 2 -- 
(z 1? z 2 ) be the 2-D vector denoting the outputs of the low-stop and the morphological filters, 
respectively. Let p{z\H$,C) and p(z\H],C) denote the joint probability density functions 
for the hypotheses denoting the absence and presence of a target, respectively, for clutter 
level estimate C. The likelihood ratio is then given by: 


Lh,c{z ) 


p(z\H u C) 

p(z\H 0 ,C) 


( 6 . 1 ) 
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6.2 Statistical behavior of low-stop and morphological 
filters 

In the following analysis, it is assumed that the input image pixels are described by the 
sum of the signal 9, background level 0, and the camera noise u, which is modeled as an 
uncorrelated Gaussian noise of zero mean and variance ij 1 . 

x = 6 + 0 + v (6.2) 

If there is no clutter, the distributions of x in absence and presence of the target are given 
by: 

p(x\H 0 ) ~ N(0, if) , p(x\H x ) ~ N(B + 0, rf) (6.3) 

If clutter is present, the exact distributions would depend on the nature of the clutter. 
Here, it is assumed that the presence of clutter changes the mean background level, and the 
variance parameter of the noise, making these parameters space varying. 

Low-stop filtering is performed by subt racting the low-pass filtered image, using a weighted 
spatial average of the neighborhood, from the original image. This filter attempts to sub- 
tract the background level. Since it is a linear filter, if the input is normally distributed, the 
output zi will also be distributed as: 

p(z t \H 0 ) ~ N(0,af) , p(zi\H { ) ~ N{m,of) (6.4) 

with 

<7/ = .ftp , pi = gfi (6.5) 

where ft and gi are the amplification gains in the standard deviation and mean due to the 
filter. It should be noted that the background level 8 is subtracted out by the filter. 

Morphological filtering is performed by taking the difference between the original image 
and its opening (positive targets) or closing (negative targets). Without loss of generality, 
only positive targets are considered, which are detected by subtracting the opening from 
the original image. This is expected to remove uniform background, as well as most of the 
clutter. 

To obtain a model for the distribution of the morphological filter and to verify the distri- 
bution of low stop filter, simulations were performed. A large number of floating point images 
containing Gaussian noise were generated. Low-stop and morphological filter were applied 
to these images, and the histograms of the filter outputs were obtained. Figure 6.1 (a) shows 



the histogram of the original image with Gaussian noise. Figure 6.1 (b) shows the histogram 
of the low-stop filter output, which is normally distributed with zero mean, as expected. 
The histogram of the morphological filter output is shown in Figure 6.1 (c). It can be seen 
that the histogram resembles a normal distribution with a positive mean. However, since 
the opening of an image is always less than or equal to the original image, the filter output 
is always non-negative. Hence, the distribution is truncated on the negative side, and has an 
impulse at zero in place of the negative values. For clarity, the distribution after removing 
the impulse is shown in Figure 6.1 (d). 

This distribution can be modeled by using a hypothetical normally distributed variable 
~ N (thru ^m)- The output z m of the morphological filter can be expressed in terms of 
as: 

z m = max(£ m , 0) (6.6) 

It can be shown that the explicit distribution of z m is given by: 

p(z m \H 0 ) = + r^m) $ (-^) (6.7) 

where u(-) is the unit step function, S(-) is the Dirac impulse function, and G(-) and <£(•) are 
the probability density and cumulative distribution functions of a standard normal variable, 
respectively. It can be shown that the mean and variance of this distribution, which are 
different from the parameters (i m and a 2 m , can be expressed as: 

i hn T ( 1 1 in / o !u ) + a m G(p m / a m ) 

~ Hm) ( 6 . 8 ) 

Hence, the parameters // m and a m can be obtained from the observed values of ra„, and 
by using a numerical method. It can be shown that this procedure yields the maximum 
likelihood estimates of the parameters. The parameters derived from the above simulations 
are shown in Table 6.1. 

To obtain the distribution in presence of a target, a number of simulated targets of 
fixed amplitude were added to each of the images generated above. Morphological filter was 
applied to these images, and a histogram of pixel values only at the target positions was 
obtained. However, since the number of targets is not as large as the total number of pixels 
in the image, the histogram is less reliable than in the case of absence of targets. These 
experiments were repeated for various signal amplitudes and the sample mean and variance 
of the outputs were computed. The sample means and variances were taken as the estimates 
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Figure 6.1: Statistics of low-stop and morphological filters: Histograms of: (a) Input image 
with Gaussian noise, (b) Output of low-stop filter, (c) Output of morphological filter, (d) 
Output of morphological filter after removing impulse at zero value. 
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Table 6.1: Statistical parameters of low-stop and morphological filters derived from simula- 
tions 


Parameter 

Value 

9 

10.0 

m = im 

-6.6e-08 ~ 0.0 

S/ = (J[ 

9.9815 ~ 10.0 

rn m 

7.0539 


/ .8352 

ft in 

4.6293 


10.8423 


of the means and variances of the distributions. For the low-stop filter, the parameters /q 
and 07 coincide with the distribution mean and variance m t and sf, respectively, and are 
approximately equal to the signal amplitude 9 and the input noise standard deviation r?, 
respectively, corresponding to (ji ~ 1 and /) ~ 1. For the morphological filter, the actual 
parameters and a m of the underlying normal distribution were calculated from the 
and sf using the simultaneous equations (6.8). It was observed that the parameter a m 
is approximately equal to the noise intensity r/, and does not change much with the signal 
amplitude 9. However, the parameter increases non-linearlv with 6. It has a positive value 
at 6 = 0 i.e., noise-onlv condition - and increases with a lower rate than the corresponding 

low-stop filter parameter /q. Figure 6.2 shows the plots of the parameters p; and fi m against 
the signal amplitude 9. 

The output of the morphological filter is invariant to the constant background level 3. 
Furthermore, it also suppresses the clutter. Hence, the effective ‘noise’ intensity for the 
morphological filter would be different from that for the low-stop filter in case of cluttered 
scenario, and is denoted by i] m . However, in the case of the above simulations it is the 
same as the original noise intensity r/. If r/ m as well as the signal amplitude 9 are scaled 
by a constant factor, o m and /r w will get scaled by the same factor. Hence, outputs of the 
morphological filter for any general // m can be written as: 

, - ( 6 \ - ° m l f m0 \ (C Q\ 

Vm Jm i 9m 9m 9m I I t 9m I I (6-9) 

\9mJ Jrn \ J 

where f m is the gain in standard deviation (/ m ~ 1), neglecting the dependency on the target 
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SNR, and g m { ) is the gain in mean, depending on the target SNR 0/r] m . The function f m 
can be obtained from the experimentally determined values of //„, and a m for ;/ = 10, plotted 
in Figure 6.2. 

It was also observed that there is a correlation between the outputs of the low-stop 
and the morphological filters. Hence, tin* joint distribution of the two outputs is modeled 
as a normal distribution, truncated for the morphological filter. Assuming a hypothetical 
random variable £ = (£/, £ m ) ( which is normally distributed, the actual output vector z can 
be expressed as: 

^ = (zh ZmY = (£/, max(£ m , 0))' (6.10) 

The parameters of distribution of z are: 

IH ^ potVm 

t l = • £ = 2 

Pin P&l&m 

where p is the correlation coefficient, and of and of, are the individual variances of z t and 
z m , respectively. The distribution mean and covariance matrix are given by: 


( 6 . 11 ) 


m _ s'f p’ s i s m 

rn m p's t s m s 2 m 


(6.12) 
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Note that due to the linearity of the low-stop filter, we have m/ — p t , s t = a t . 

However, using these relations, it is analytically difficult to calculate the actual correlation 
coefficient parameter p from the observed correlation coefficient //. Furthermore, such a 
computation would have to be repeated for every pixel, which is highly inefficient. Hence, 
the value of p = p' is currently being used. 

Using the above models of low-stop and morphological filter outputs, the distribution of 
z for z m > 0 is given by: 

p(z\H 0 ) = |27r£| -1/2 exp [(z - /xofE" 1 ^ - Po)/2] 

p(z\Hi) = |27rE| -l/2 exp [(z - /x fl ) f E _1 (z - pe)/2^ (6.13) 

For z m < 0, p(z\H r ) = 0 Also, there is an impulse function at z m = 0, so that the integral of 
p becomes unity. 


6.3 Bayesian fusion of multiple filters 


The combined likelihood ratio of the two filters is given by: 

p(z\H u C) ~ N([i0, Ec) 
H ' C ^ } p{z\Hq, C) N(p 0 ,E c ) 


(6.14) 


where pe and po are 2-D vectors denoting the mean outputs of the algorithms in presence 
and absence of target. The covariance matrix Ec, which depends on the clutter level, can 
be estimated using the image, and will be denoted by E for brevity. The same covariance 
is used for the presence and absence of the target, since it. is experimentally observed that 
there is not much difference between the respective covariances. 

Using equation (6.13), the log likelihood ratio (LLR.) is given by: 

/(z) = log L h ,c{z) = ”(2 - /q?)'E -1 (z - Pe) + |(z - //o) t E _1 (z - // 0 ) 

= (Pe ~ /io) t S- , (z - po) - \{pe ~ PoYX-'ipg - po) (6.15) 

The parameters of the LLR in absence of target i.e.. E[z\H 0 \ — //. 0 , V' [z | // 0 ] = E - can be 
computed as: 

E[l(z)\H 0 \ = (p 9 - Po) l X- l E[z - /t 0 |^o] - \(pe ~ PoY^iPe ~ Po) = Af 
V[l(z)\H 0 ] = (p e - p*) t Y.- x V\z - po\Ho]E~ x (pe - p Q ) = d 2 (6.16) 
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where d known as the deflection coefficient [13] is the generalization of the signal to noise 
ratio for multiple dimensions. 

d=y / (pe - Po - p 0 ) (6.17) 

When the target of any strength is present, the variance parameter still remains the same 
but the mean parameter changes. For the target strength such that E[z\H\] = p#, the LLR 
parameters are given by: 

E[l(z)\Hi] = (pe - noYZ~ l E[z - p 0 |//i] - ^(p<? - Po ) t S _1 (p 0 - p 0 ) = 

F[i(.z)|ffi] = (p* - Po^M* - Pfl|/7i]S _1 (p6i - Po) = d 2 (6.18) 

It is seen that the mean and variance of the LLR are dependent on the mean and variance 
parameters of the filter outputs. Due to this, the probability of false alarm and mis-detection 
also depends on these parameters. Accordingly, two approaches of obtaining a detector are 
shown below. 


6.3.1 Constant False Alarm Rate (CFAR) detector 


To get a constant false alarm rate irrespective of the local variance, the LLR is normalized 
so that it would have a zero mean and unit variance in absence of the target. The resulting 
function is given by: 


D{z) = 


l(z) - E[l(z)\H 0 } (p fl - po yz-Hz - po) 


f V[l(z)\H 0 ] 


TPs - Po)'£ -1 (P 0 - Po) 


(6.19) 


This is a matched filter, which matches the 2-D outputs from low-stop and morphological 
filters, to the expected outputs of these filters. Since D(z\Ho) ~ A r (0, 1), if a threshold r is 
applied, the false alarm rate is given by: 


1 - $ 


t-E[D(z)\H 0 

V[D(z)\H 0 ] 


1 - <F(t) 


( 6 . 20 ) 


where <!>(•) denotes the cumulative distribution of a standard normal variable. Note that 
this is now independent of any parameters. In presence of a target so that E{z\H\] = p^, it 
can be easily seen that D(z) ~ N(d, 1). Hence, the mis-detection rate is given by: 


Pm n — $ 


t-E[D{z)\H x ] 

V[D{z)\H y \ 


= $(r - d) 


( 6 . 21 ) 
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The CFAR approach attempts to maintain a constant false alarm rate all over the image, 
irrespective of the local variance. Hence, it would be useful if a constant false alarm rate is 
required in all parts of the image, for example, if the parts are processed separately on parallel 
processors. To check the conditions under which this filter is optimal, the log likelihood ratio 
l(z) is written in terms of the discriminant function D(z) as: 

l(z) = d D{z) - d 2 / 2 (6.22) 


It can be seen that, l(z) and D(z) are monotonic to each other when the deflection coefficient 
d, given by equation (6.17) remains constant. Under such conditions, thresholding D{z) is 
equivalent to thresholding l(z). the latter being the Bayesian optimum. The deflection 
coefficient is dependent on the covariance of the noise, as well as the target strength, and 
is the generalization of SNR for multiple dimensions. Thus, if the variance parameters of 
the individual filter outputs, as well the target, amplitudes, are constant across the image, 
this approach is optimal in terms of the false alarms and mis-detection rates. However, 
in practice, the parameters (especially the low-stop filter output variance) do depend on 
the clutter level. In such a case, if the target amplitude is constant throughout the image, 
the CFAR approach is not optimal. However, if the criterion for good detection is to detect 
targets having a particular SNR i.e., stronger targets in cluttered regions but weaker targets 
in uncluttered regions the CFAR approach can be considered optimal. 

It can be seen that D(z) is dependent on the target amplitude 0 through /.ig — // 0 , as 
well as d. If hq — p 0 is a linear function of the target amplitude 9, it would cancel out in 
equation (6.19) and D(z) would become independent of the signal amplitude 8. However, if 
/j,g — Mo is non-linear, the filter would be optimal only under specific conditions. 

The false alarm rate is determined by the threshold r, whereas the mis-detection rate is 
also determined by the deflection coefficient d. Consider optimizing the matched filter for 
a particular d, in an environment with clutter covariance E. If 9 is the signal amplitude, 
equations (6.5) and (6.9) yield: 


Me - Mo 


9i9- 0 

|_ (.Mm ( ^7 ) ~ Mm (0)) Mm 


MW - MiO 

9)8 

1 ^ -Mm(0)) J 

Using this expression, the following equation should be numerically solved for 8 by evaluating 
Me — Mo using equation (6.23) with the particular d. 


(6.23) 


(Me - Mo )‘S 1 {fig - Mo) = d 2 


(6.24) 
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However, if the covariance matrix S varies throughout the image, this procedure would have 
to be carried out for all pixels, which would be highly inefficient. Furthermore, the procedure 
optimizes only for a particular value of d. 

Alternatively, if one assumes that d and 9 are small, one can optimize the fusion using a 
Locally Most-Powerful (LMP) test [13, 17]. For small value of 6 , we have: 


de - no ~ 


ft) •«= 

at 


. sLW . 


9 = s9 


(6.25) 


where s is 2-D vector independent of 9. The expression is now linear in 9 , and the discriminant 
function D(z) becomes independent of 9. 


with 


D(z) 


'(z^/Xp) 

\/s*E _1 s 



(6.26) 


(6.27) 


6.3.2 Direct thresholding of Log Likelihood Ratio (LLR) 

As shown in the previous section, if the amplitude of the signal to be detected is fixed 
irrespective of the local variance, the overall mis-detection rate for a given overall false 
alarm rate is not minimized by the CFAR approach. In fact, there cannot be a single 
optimal detector for all amplitudes. Hence, the fusion should be optimized for a particular 
amplitude. A criterion for choosing this amplitude is described below. 

Suppose that some particular minimum rates of false alarms as well as mis-detections are 
required for the algorithm. The amplitude corresponding to the minimum possible variance 
- i.e., the variance of the camera noise without clutter can be used to tune the fusion. If 
the actual amplitude is smaller than this amplitude, even an optimal detector tailored to 
that amplitude will not give the required false alarm and mis-detection rates. On the other 
hand, since the performance of the detector increases monotonically with the amplitude, a 
larger amplitude yields a better performance, though it may not be optimal. 

Suppose the LLR threshold is r. Using the mean and the variance of the LLR in absence 
and presence of the target, given by equations (6.16) and (6.18), the false alarm and mis- 
detection rates can be computed as: 


FA 


1 ? p MD 


$ 


V - <?/ 2 ' 

d 


1 - d> 




(6.28) 
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If one denotes: 


0o = $ _1 (1 - Pfa) = I+ ^ /2 , 0, = $-'(1 - Pmd) = d2/2 T (6.29) 
then t can be eliminated to obtain: 

0o — 0i =d = yfine - Hq YT rHjif) - // 0 ) (6.30) 

The target amplitude can be chosen such that //,« corresponding to it satisfies this equation, 
using £ -1 under noise only conditions. 

6.4 Application on images 

To apply this procedure on images, the statistical parameters are computed in an annular 
31 x 31 window around each pixel, where an 11 x 11 window immediately around the pixel is 
excluded to reduce the biasing of parameters when the target, is present at the pixel. There 
is a trade-off between using larger sized window giving more reliable estimates, and smaller 
sized window giving better localization in case of space varying clutter intensity. The window 
size used here was arbitrary. However, use of different window sizes can be explored to find 
the optimum window size. 

Efficient methods are used to estimate the distribution mean m and the covariance S at 
each pixel of the low-stop and morphological output images. From these, the estimates of // 
and £ are calculated and stored as images. However, in some experiments, fixed values of // m 
and a 2 m were used for the morphological filter, since the estimates are less reliable, but do not 
change much over the image (unlike low-stop filter, where these parameters heavily depend 
on the clutter). The template signal for the matched filter is calculated using equation (6.27). 
and the matched filter is applied separately to each pixel. 


6.5 Results 

The algorithm fusion approach was evaluated using the performance characterization ap- 
proach of Chapter 3. Background images obtained from digital and analog cameras shown 
in Figures 6.3 (a) and (b), respectively, were used for false alarm analysis. For mis-detection 
analysis, a number of targets of size 2x2 were added to these images. Low-stop and mor- 
phological filters were applied to these images. The outputs of these filters were fused using 
the two approaches described above. The local variance of the low-stop filter output, which 
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is a measure of clutter, is shown in Figures 6.3 (c) and (cl). The histogram of the local 
variance is shown in Figures 6.3 (e) and (f). It is seen that the analog camera image has a 
much higher clutter level than the digital camera image. 

For the Constant False Alarm Rate (CFAR) fusion, the Locally Most Powerful (LMP) 
test was used. This gave the matched filter template as: 

s=(.^<4(0)) f = (1.0,0.8623)' 

gives a slightly lower weight to the morphological filter when the level of noise is same for 
both the filter outputs. The plots of the mis-detections against the false alarms for the digital 
camera images are shown in Figure 6.4 (a) and (b), These use the assumption that the outputs 
of the low-stop and morphological filters are correlated. Algorithm fusion was also performed 
assuming independence between filters i.e., p — 0. The independence assumption gave a 
slightly better performance for the fused filter as shown in Figure 6.4 (c) and (d), possibly 
because the correlation between filters may not. have been adequately modeled. Similar plots 
using analog camera images are shown in Figure 6.5. 

In both the cases, it is seen that the fused output does not give optimal performance 
for all the rates of false alarms. However, it can be observed that the fused output does 
give larger weight to the filter which has a better performance in the particular case. For 
example, in the case of digital camera images having relatively low clutter, (Figure 6.4), the 
better performing low-stop filter is given higher a weight. On the other hand, for analog 
camera images (Figure 6.5) with severe background clutter, the morphological filter which 
performs better is given a higher weight. Since the individual filter which would actually 
perform better in a particular case would not be known a-priori, the fusion at least serves 
the purpose of selecting the better filter. 

To explore the reasons for the non-optimality of the CFAR. approach, the method of 
thresholding the log likelihood ratio (LLR) was first, used in place of the CFAR fusion. 
The results of thresholding likelihood ratio are shown in Figure 6.6. The outputs of the 
individual detectors, the likelihood ratio detector using each filter, and the fused likelihood 
ratio detector are shown for amplitudes of 6.0 and 8.0. The amplitude used for computing 
the likelihood ratio was of 6.0, which gave the signal template as: 


pe - fio = (6.0,5.5603)' = 6.0(1.0,0.9267)' 

which is only slightly different from the LMP template (scaled), due to the non-linearity of 
the morphological filter. 
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Figure 6.3: Images from (a) digital (b) analog camera with partly cluttered background. 
Image of the local variance of low-stop filter output, which is the measure of clutter for 
images from (c) digital (d) analog camera. Histogram of the local variance of low-stop filter 
output for images from (e) digital (f) analog camera. 


86 










(a) (b) 



FA FA 

(c) (d) 


Figure 6.4: Operating curves for digital camera image using CFAR fusion: Assuming cor- 
relation between filters with target amplitudes of: (a) 6.0 (b) 8.0 Assuming independence 
between filters with target amplitudes of: (a) 6.0 (b) 8.0 
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Sranal amplrtude - 20 Sianal amotrtude - 40 



Figure 6.5: Operating curves for analog camera image using CFAR fusion: Assuming corre- 
lation between filters with target amplitudes of: (a) 20.0 (b) 40.0 Assuming independence 
between filters with target amplitudes of: (c) 20.0 (d) 40.0 
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(a) (b) 



Figure 6.6: Operating curves for digital camera image using LLR thresholding: Using cor- 
relation between filters with target amplitudes of: (a) 6.0 (b) 8.0 Assuming independence 
between filters with target amplitudes of: (c) 6.0 (d) 8.0 
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Figure 6.7: Operating curves for digital camera image using LLR thresholding, and fixed 
value of morphological variance parameter o 2 m = 1.5, and assuming independence between 
the filters, for target amplitudes: (a) 6.0 (b) 8.0 


However, it was seen that for the matched signal strength of 6.0, it. still did not give 
desirable performance. Hence, another reason for this non-optimality was explored. It was 
observed that the variance parameter of the morphological filter output was underes- 
timated from the images. This unreliability of was because the estimation was performed 
using small windows around every point in the image. Furthermore, there was quantization 
error, since the noise in the images was of the same order as the gray level resolution of the 
real images. However, since the morphological filter is comparatively insensitive to clutter, 
the value of a ^ remains approximately same throughout the image. Hence, the entire back- 
ground image from the digital camera (without adding targets) was used to pre-compute the 
parameter value as = 1.5. The low-stop filter parameter of was estimated as before, since 
its value does depend on the local clutter level. The correlation coefficient was assumed to be 
zero. The results obtained using these parameters are much better, and shown in Figure 6.7. 

Hence, it can be concluded that the performance of CFAR approach was poor due to the 
following reasons: 

1. CFAR fusion is not optimal under the condition of constant target amplitude. 

2. The morphological filter parameters are not reliably estimated from small sized win- 
dows. 
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Figure 6.8: Operating curves for digital camera image using CFAR approach with condition 
of optimality, and fixed value of morphological variance parameter er^ = 1.5, assuming 
independence between the filters, where the targets have amplitude such that the deflection 
coefficient d is constant equal to: (a) 10.0 (b) 15.0 


However, as shown before, the CFAR approach is theoretically optimal, when the tar- 
get amplitude is not constant, but is adjusted so that the deflection coefficient d given by 
equation (6.17) remains constant. To check the optimality of the CFAR approach for this 
condition, another set of experiments was performed. The statistical parameters of the low- 
stop filter were estimated at every pixel using the background image without the addition 
of targets. The morphological filter parameters were estimated for the entire image (instead 
of individual pixels). Using the parameters of the low-stop and morphological filters, the 
deflection coefficient d\ for a unit amplitude of the signal was computed at every pixel, and 
stored as a separate image. False alarm rate was also estimated using this image as before. 
For estimating mis-detection rates, targets were added to the background image. The ampli- 
tude of the target at a particular pixel was given by d/d\ where d\ is the function of the pixel 
coordinates and d is constant. The mis-detection rate was then estimated from a number of 
such images. The LMP template was used for fusing the outputs of the individual filters. 
The plots of the mis-detection rate against false alarm rate are shown in Figure 6.8. It can 
be seen that the fusion output is better or as good as the individual filter outputs, within 
experimental error. 
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Chapter 7 

Detection of Translating Objects 


In addition to the detection of objects on a collision course, it is useful to monitor the objects 
which are crossing the aircraft. For this purpose, a system was designed to specifically detect 
objects having a translational motion in the image. To distinguish translating objects from 
ground or cloud clutter, the following criteria were used: 

1. The object should have sufficient signal strength. 

2. The object should have an image velocity greater than a threshold. 

3. The object should have a consistent mot ion i.e., its velocity must not change abruptly. 

The system to detect translating objects has been implemented on the pipelined image 
processing system, the DataCube MaxPCI described in Section 2.8 to obtain real time perfor- 
mance. The system was mounted on the Air Force Total In-Flight. Simulator (TIFS/NC1314) 
aircraft, and flight tests were conducted by NASA with another aircraft flying in front of it. 
The detection and tracking of the target aircraft were demonstrated during the flight test. 

This system is divided into two stages, an image processing stage and a tracking stage. 
The first stage consists of image processing steps which remove most of the clutter, and 
isolate potential features which could be translating objects. This stage involves repetitive 
image operations such as convolution, pointwise operations, histograms, etc. which are 
suitable for a pipelined architecture, and can be performed in integer format. Hence, these 
steps are implemented on the DataCube machine. The output of this stage is a list of image 
features which are likely to contain the target objects, including their positions and the signal 
strengths. However, the list may also contain features corresponding to background clutter, 
which are not separated by the simple image processing steps of the first stage. The second 
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stage tracks these features to distinguish the genuine translating objects from background 
clutter using the criteria mentioned above. Since the first stage has reduced the volume of 
data to be operated on, more complicated target tracking algorithms can be implemented 
even on the host PC associated with the DataCube. The threshold used in the first stage 
is adjusted dynamically to give a nearly constant number of features for the second stage 
so that they can be processed in real time using the slower host. This matching of the 
output rate of one stage to the input rate of the next stage is known as the rate constraint 
criterion [5]. 


7.1 Image processing stage 


This stage performs the basic image processing steps to suppress clutter and extract features 
which could potentially be translating targets. 


1. Resolution Reduction: The resolution of the image is reduced so that the system is 
capable of operation in real time. The image is convolved with the following low-pass 
filter mask and then down-sampled by two in both horizontal and vertical directions. 

1 4 6 4 1 

4 16 24 16 24 

6 24 36 24 6 

4 16 24 16 24 

1 4 6 4 1 

Low-pass filtering suppresses high frequencies, which would otherwise have been aliased 
to low frequencies by the down-sampler. Although the image resolution is reduced, the 
signal to noise ratio is actually enhanced. This is because the target size is usually 
greater than 2 pixels, leading to spatial integration of the target, contrast. 


M n = 


256 


2. Low-stop filtering: A low-stop filter is applied to the reduced image to suppress back- 
ground clutter. The filter is implemented by convolving the image with the following 
masks, one after the other: 



1 4 6 4 1 

2 8 12 8 2 

1 4 6 4 1 


Mo = - 


128 


0 2 3 2 0 

2 8 12 8 2 

3 12 -108 12 3 

2 8 12 8 2 

0 2 3 2 0 
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The mask M\ is a smoothing mask, which performs spatial integration for large targets. 
A rectangular mask is used since the targets are expected to have a greater width than 
height. Application of the mask A/ 2 is equivalent to subtracting a smoothed image 
from the input image. The overall result of the two convolutions is the subtraction 
of a low- pass filter output with a larger mask from a low-pass filter output with a 
smaller mask. Hence, this step suppresses uniform background intensity and weak 
clutter corresponding to low frequencies, and also performs spatial integration for larger 
objects. 

3. Image differencing: Image differencing is performed on the low-stop filtered images by 
subtracting consecutive frames. This is equivalent to a low-stop filter in temporal 
direction. Since the object is assumed to be translating, image differencing suppresses 
stationary objects corresponding to background clutter. It should be noted that steps 
1 to 3 are theoretically interchangeable, since they are all linear filters. However, 
since these operations are performed with integer arithmetic of limited precision, the 
particular order of the steps is used to reduce the truncation error. 

4. Non-maximal suppression: Directly using the output of the previous step would give 
rise to a large number of features for an extended target. Non-maximal suppression 
is performed to get a single feature (or sometimes a small number of features) for the 
entire target. Pixels can have both positive or negative values corresponding to bright 
and dark targets, respectively. Hence, an absolute value image is first formed, and 
every pixel which is not a local maximum in its 3x3 neighborhood is marked. The 
marked pixels are set to zero in the original image - i.e., the image before taking the 
absolute values. 

5. Histogram formation: To extract candidate features, the output from the above steps 
should be thresholded. Furthermore, the threshold should be chosen so that the number 
of features neither overloads the tracking stage, nor keeps it unnecessarily idle. Hence, 
the threshold is selected so that, the number of pixels exceeding the threshold is less than 
or equal to a fixed rate which matches the operation speed of the tracking stage. For this 
purpose, a histogram of the image is constructed. The threshold then is determined as 
the smallest pixel value for which the number of elements in the histogram bins above 
this value does not exceed the fixed rate. Applying this value as the threshold would 
then ensure that the number of features remains bounded. 
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6. Thresholding and feature output: Pixels in the image with the output value greater 
than the threshold are separated as features, and their positions as well as the ampli- 
tudes are transmitted to the tracking stage. 


7.2 Tracking stage 


This stage maintains a list of tracks containing the frame number, unique ID, position, 
velocity, and amplitude. The list is empty in the beginning, he following steps are repeated 
for every frame for which the list of features is received from the image processing stage: 


1. Track update: For each track in the list of t racks, the list of features is scanned to obt ain 
features in a neighborhood window around the track position. If one or more such 
features are found, the one with the largest amplitude is selected as the continuation 
of the track. Using the coordinates (z\,Z2) of this feature, as well as the current track 
position (£1,0:2) and velocity ( r/ 1 . //•,>). the expected position and velocity for the next 
frame is estimated using a Kalman filter. The filter is applied separately for horizontal 
(i — 1) and vertical (i = 2) directions. For each direction, the state vector is given 


by X{ = [ x 


Ui 


, and the observation is the feature coordinate z t . The track life 
n of the track is the number of frames in which the target has been observed, with 
adjustments made in the frames where the target is not observed. The measurement 
update is given by: 


xf (n) = x i (n) + I\\ (n) (z, - x t ) 

uf(n) = Ui(n) + K 2 (n) (Zi - Xi) (7.1) 


The state update is given by: 

Xi(n + 1) = (n) + (n) 

Ui(n + 1) = w+(n) (7.2) 

The Kalman filter matrix K(n) — [ A', (n) K 2 (n) ] is pre-computed using the in- 
verse covariance formulation of the Kalman filter. The computation is performed for 
a number of n = where N is large enough so that K (N) does not change 

significantly with N. 

The track amplitude is updated using recursive averaging according to the following 
equation: 

F(n + 1) = f(n) + aF(n) (7.3) 


95 



where F(n) and F(n + 1 ) are the track amplitudes for the current and next frames, f(n) 
is the feature amplitude, and o is the forgetting factor. The track life n is incremented 
by one. 

If no feature satisfying the above conditions is found in the neighborhood of the track, 
the position and velocity are extrapolated using only the state update. Theoretically, 
this would mean that the values of the Kalman filter matrix would have to be recom- 
puted. To avoid such a computation, the value of the track life n is reduced bv a factor 
to approximately simulate the effect of having ‘lost track’ of the feature. The feature 
amplitude is updated using f(n) = 0 in equation (7.3). 

2. Formation of new tracks: After all the current tracks are updated, features in the fea- 
ture list are used to check for new tracks. For every feature, the list of tracks is scanned 
to see if a track is already there in its neighborhood. If not, a track is created out of 
the feature with its track life n = 1. Its position (xi,.^) will be the same as feature 
position ( 21 , z 2 ), whereas velocity ( 14 , u 2 ) is initialized to zero. The actual velocity will 
be computed only in the next frame. 

3. Pruning the list of tracks: If the number of tracks is too large, the stage can get over- 
loaded and fail to operate in real time. To eliminate this possibility, if the number of 
tracks are greater than a particular number, the weakest tracks are deleted. 

4. Merging similar tracks: It may happen that two or more tracks may be formed corre- 
sponding to the same object. Hence, tracks which are very close to each other and have 
nearly the same velocity are merged, retaining the one with the larger track amplitude. 

5. Output: Tracks which satisfy the criteria of the object, including having an amplitude 
larger than a threshold, as well as other factors are output as potential objects. 

7.3 Results 

The real-time image capturing, recording, and processing system were demonstrated by 
the flight tests conducted by NASA. During the first, set of flight tests, image sequences 
were captured and recorded successfully at, the rate of 30 frames per second. The tracking 
algorithms were designed and fine-tuned using these image sequences. During the next set 
of flight tests, in addition to the real-time capturing and recording, the translating target 
tracking algorithm was executed concurrently at the rate of 15 frames per second. Several 
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Figure 7.1: Tracking algorithm applied on an image sequence with the target aircraft trans- 
lating from right to left at a distance of 3 nautical miles. The target aircraft is located at 
the end of the track in this image. 


image sequences with the target aircraft crossing the host aircraft were obtained. It was 
observed that the system successfully detected and tracked the translating object during the 
flight tests. Figure 7.1 shows a trace of the tracking algorithm applied on an image sequence 
with the target aircraft translating from right to left at a distance of 3 nautical miles. 

Table 7.1 summarizes the performance of the translating target tracking algorithm with 
different distances between the host and the target aircraft, during the first set of flight 
tests. The false alarm rate is measured as the ratio of the total number of false alarms 
throughout the sequence to the number of image frames in the sequence. The mis-detection 
rate is measured as the ratio of the number of frames in which the target was missed to the 
total number of frames. The false alarm rate depends on the amount and motion of clutter 
in the images, whereas the mis-detection rate depends on the target size and contrast, and 
therefore increases with the target distance in most cases. Since false alarms can be very 
annoying to the pilots, a low false alarm rate was more desirable than a low mis-detection 
rate. Hence, the parameters of the algorithms were selected to reduce the false alarm rate, 
and were same for all the scenarios. It is possible to get a better performance by adjusting 
parameters according to the characteristics (such as the clutter level) of each scenario. 
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Table 7.1: The performance of the translating target detection algorithm for a number 
of target distances. The false alarm rate is the ratio of the total number of false alarms 
throughout the sequence to the number of image frames in the sequence. The mis-detection 
rate is the ratio of the number of frames in which the target was missed to the total number 
of frames. 


Distance (nmi) 

Mis-detection rate 

False alarm rate 

1.5 

0.061 

0.000 

1.8 

0.113 

0.000 

2.0 

0.394 

0.000 

2.4 

0.059 

0.000 

3.0 

0.056 

0.000 

4.7 

0.335 

0.183 

5.0 

0.803 

0.147 

5.4 

0.643 

0.000 


The performance was relatively poor in the cases where the host aircraft rotated about 
its own axes, resulting in large image motion of background features. To improve the perfor- 
mance, the image motion due to aircraft rotation should be compensated using the aircraft 
navigation data. If this data is unavailable, the background motion should be modeled to 
separate independent object motion. For example, Irani and Anandan [9] separated the scene 
motion into planar and parallax components, and identified independently moving objects 
having a significant parallax. However, since the DataCube architecture is capable only of 
simple image processing operations, any such procedure would have to be performed on the 
host machine, using a feature based approach. 


98 




Chapter 8 
Conclusion 


This research was focused on designing and implementing algorithms for detection of obsta- 
cles in the flight path of the aircraft using the image sequences obtained from the on board 
cameras. The main contributions of this research and the possible avenues of future work 
are described below. 


8.1 Contributions of this research 

• Basic algorithms performing signal enhancement were tested for detecting flying ob- 
jects using the image sequences provided by NASA. Performance characterization of 
these algorithms vras conducted using simulated and real image sequences. It was ob- 
served that the algorithms performed well on images with little or no clutter, but their 
performance degraded in presence of clutter. 

• To distinguish the objects on a collision course from the background clutter, the differ- 
ence in the behavior of their image translation and expansion were studied. Conditions 
under which these criteria are useful were derived. Novel methods for estimating the 
rates of image translation and expansion over long image sequences were designed and 
tested on the image sequence with a large amount of background clutter. The approach 
successfully separated the obstacle from the clutter. 

• Algorithm fusion to overcome limitations of algorithms was studied, and it was observed 
that under proper conditions, a combination of algorithms performed better than the 
individual algorithms. 
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• A real-time system using pipelined image-processing hardware was designed to detect 
objects crossing the aircraft. The tracking algorithm to separate background clutter 
from crossing objects was developed and implemented on the host machine associated 
with the system. 

8.2 Future work 

• Many of the research ideas, such as the use of translation and expansion, algorithm 
fusion, etc. were tested individually. The future goal would be to combine these into an 
integrated system for obstacle detection. Performance characterization of this system 
could be done with more real image sequences. 

• During the estimation of image translation to discriminate a hazardous object from 
background clutter, the compensation of aircraft rotation was performed using the 
navigation system information. Use of background clutter to model the aircraft motion 
could be explored, so that the compensation could be performed even without the 
navigation system information. 

• False expansion occurring due to the rotation of the target aircraft can be studied. 
This expansion takes place only in a particular direction, resulting in deformation 
of the object in the image. Methods to distinguish this deformation from uniform 
expansion can be studied. 

• Gaussian models were used for studying the behavior of individual algorithms to per- 
form algorithm fusion. Better models could be developed, especially in presence of 
clutter, where the Gaussian models would not be as robust. 

• To improve the performance of crossing object detection, the image motion due to 
aircraft rotation should be compensated. This could be done either using the navigation 
data from the aircraft, or by modeling the image motion separate independent object 
motion. Since the DataCube architecture is capable only of simple image processing 
operations, such a procedure should be performed on the host machine, using a feature 
based approach. 
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Part III 


Real-Time Implementation of Obstacle Detection Algorithms 
on the Datacube MaxPCI Architecture 


Abstract 


A system was designed to capture image sequences from a digital camera, record the 
images into a high speed disk array, and process the images to perform real-time obstacle 
detection. A set of obstacle detection algorithms were chosen and implemented on the same 
system. The objective was to detect any potential obstacle in the aircraft’s flight path by 
analyzing the images captured using an on-board camera in real-time. Using this system, real- 
time image data was recently obtained successfully from flight tests conducted at NASA 
Langley Research Center. It was observed that the system successfully detected and tracked 
translating objects during the flight test. The recorded digital image sequences are valuable for 
further research on obstacle detection algorithms under different conditions. 
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Chapter 1 
Introduction 


A proper hardware platform should be chosen to implement the obstacle detection 
algorithms in real-time. The fastest choice is a customer designed multi-processor circuit board, 
but it’s very expensive and less flexible since the circuit board needs to be changed to reflect the 
modification of the algorithms. To optimize the performance/price ratio, a general purpose real- 
time image processing system may be considered as the processing unit of the on-board synthetic 
vision system. The Datacube MaxPCI, a general purpose real-time image processing system 
with pipelined image processor, is a cost-effective way to meet high-throughput low-latency 
demands and has become popular among some researchers working on real-time vision 
problems. The New Technology Disk (NTD) available with the Datacube MaxPCI has the 
required ability to perform high-speed digital image recording, which is also an important part of 
our project. Moreover, the obstacle detection algorithm should be reliable and fast so that even a 
small target can be detected in real-time. A set of obstacle detection algorithms were chosen and 
implemented on a Datacube MaxPCI system. 

Chapter 2 gives an overview of the system performing real-time image capturing, recording, 
and processing. Chapter 3 deals with the implementation issues of obstacle detection algorithms 
on the MaxPCI system. Chapter 4 explains the implementation of the obstacle detection 
algorithms on the old MaxVideo system. Finally, a conclusion is given in Chapter 5. 
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Chapter 2 

System Overview 


A system was designed to capture image sequences from an on-board digital camera, record 
the images into a high speed disk array, and process the images using multiple pipelined 
processors to perform real-time obstacle detection. Using this system (shown in Figure 2.1), 
real-time digital image data was recently obtained successfully from flight tests conducted at 
NASA Langley Research Center. These image sequences are valuable for further research on 
obstacle detection algorithms under different conditions (size, contrast, background etc). 


KODAK 

Motorized 



with 4x16 GB capacity windows NT Workstation 


Figure 2.1: The system consists of a Windows NT workstation with 
two internal Datacube MaxPCI image processing cards, and one 
New Technology Disk (NTD) Recorder. The sensor consists of a 
KODAK Megaplus ES1.0 digital CCD camera and a PENTEX 1- 
inch lens with motorized aperture control. 
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2. 1 Image capturing using remote digital CCD camera 
and motorized lens 

A critical component of the vision system is the imaging sensor. A Kodak Megaplus 
ES1.0 charge coupled device (CCD) digital camera with a Cosmicar/Pentax 1 inch (50 mm) 
motorized lens was chosen, since digital CCD cameras offer superior performance compared to 
their analog counterparts. Digital cameras are also highly immune to the spatial and temporal 
artifacts caused by transmission-line noise. The Kodak ES1.0 captures 30 frames per second 
with a IK x IK resolution in a 8 bit format (256 gray levels) [7], It was mounted in the cockpit 
of a modified Convair C-131 aircraft, called the Total In-Flight Simulator (TIFS). Because the 
recording system was located in the aft portion of the aircraft, a 100 foot digital data cable 
transferred the image data signals to the recording system. The synchronized clock signals 
generated by the camera were also transferred through the data cable. A good quality cable with 
low capacitance prevents asynchronism and noise that can occur with lengthy cables. 

The dynamic range of the captured images was very large due to the variations in factors 
such as the sun orientation, cloud conditions, and aircraft altitude. To prevent saturation or very 
low gray levels in the captured images, a motorized aperture lens was installed on the camera 
and a remote aperture control box next to the recording system (100 feet from the camera). 
With this motorized aperture, the operator manually adjusted the aperture during the flight. The 
camera exposure control software provided by Kodak was not used because that could 
inadvertently produce blurred images (caused by extended exposures) or unacceptable noise 
levels (caused by brief exposures). Furthermore, it was easier for the operator of these 
experiments to use a manual knob to change the lens aperture rather than use the software to 
change the exposure time of the camera. 


2.2 Real-time recording of digital image sequences using 
new technology disk 

A typical flight sequence with a target aircraft in the field-of-view can last several 
minutes and produce thousands of IK x IK images. One means of reducing the massive amount 
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of disk storage space to hold these images is to use compressive algorithms. However, because 
it was desirable to analyze the raw characteristics of the camera, uncompressed images were 
stored. For this task, a system recording data at a rate of 30 MB/second (or 1.8 GB/minute) was 
required. 



Figure 2.2 : Our system with two Datacube MaxPCIs and a NTD. The 
system is a Pentium PC workstation, consisting of two Datacube 
MaxPCIs and one New Technology Disk (NTD) Recorder. The first 
MaxPCI is equipped with a storage 96 and a General Purpose PSMOD. 

The second MaxPCI is equipped with a Convolver 200 and another 
General Purpose PSMOD. 

To satisfy these large bandwidth and storage requirements, a Pentium 233 workstation 
(running Windows NT) with two internal MaxPCI cards from Datacube, Inc., and an external 
disk array, called the New Technology Disk (NTD), were used (shown in Figure 2.2). The 
MaxPCI is a real-time image processing card with pipelined image processors. It provides a 
cost-effective way to meet high-throughput, low-latency demands. The data cable was 
connected from the digital camera to one of the MaxPCI cards. The camera sends images 
through two channels, with odd lines in one channel and even lines in the other channel. The 
MaxPCI card receives the images through these two channels, with a throughput of 1 5 MB/sec 
from each channel. The MaxPCI card was configured so that the two channels are merged to 
form complete images in the MaxPCI’s memory. Then the images are sent via the MaxVGA bus 
to the MaxVGA card for display, and via the PCI bus to the Adaptec AIC-1 160 disk controller 
card for storing. Transfer to the disk controller card was accomplished using High Speed Image 
Access (HSIA), a technique that moves data directly back and forth between the disk controller 
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card and the MaxPCI’s memory, without being copied in an intermediate memory buffer. This 
eliminates the copying of data to host memory as it would be required by other disk storage 
products. Finally, the images are transferred through a Fibre Channel (FC) cable to the NTD, 
where the images are stored. The Fibre Channel is a technology for transmitting data between 
computer devices at a data rate of up to 1 GB/sec. Since it is three times faster than the Small 
System Computer Interface (SCSI), Fibre Channel is expected to replace SCSI as the 
transmission interface between servers and storage devices. The NTD is a Redundant Array of 
Independent Disks (RAID) sub-system that enables high-speed lossless digital image recording 
and playback. The NTD used was a four disk array, with 16 GB per disk. With the FC option, 
it is possible to achieve NTD transports in excess of 32 MB/sec. To achieve the highest access 
speed, there is no formatting of data storage on the NTD. All images are recorded as plain raw 
data to the consecutive physical sectors of the NTD. The NTD can record and playback images 
at a real-time frame rate up to 40MB/sec. 

Datacube offers a helpful graphical user interface (GUI) called MaxLab to control the NTD. 
The interface of the MaxLab is like a VCR panel. MaxLab was used in the first flight test for 
image data recording. However, MaxLab is a commercial package useful only for NTD control, 
so no additional image processing tasks can be performed simultaneously while recording. 
Another C-callable library, NtdlfLib, is also available for programmers to create their own NTD 
access programs according to their needs. The NtdlfLib is integrated with ImageFlow, a C- 
callable library that configures and manages data transfers on the MaxPCI to perform real-time 
image processing. With the power of the NtdlfLib and ImageFlow programming, the system 
can not only record the digital images in real-time, it can also be extended to perform several 
image processing algorithms concurrently. It should be noted that it is not a simple job to 
develop a parallel program on the MaxPCI since the programmer must know a good deal about 
the underlying hardware. Moreover, since there is no useful debug tool for ImageFlow at this 
time, the programming task using ImageFlow is time-consuming. However, a very satisfactory 
system can be developed with appropriate effort. 
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2.3 Aircraft maneuvers in the flight tests 


Two aircraft were involved in these flight tests, which were based at NASA Langley 
Research Center. The TIFS was the host aircraft, and it carried the Kodak camera and Datacube 
computer. A Beechcraft King Air B-200 was the target aircraft. The purpose of the flight tests 
was to obtain images containing different maneuvers conducted by the target aircraft. For all 
maneuvers, the host aircraft had an altitude of 3500 feet and a speed of 159 knots. Two classes 
of maneuvers were flown. 

In the translating maneuver (shown in Figure 2.3(a)), the target aircraft translated (moved) 
in the image sequence. It was performed with the target aircraft crossing perpendicular to the 
direction of motion of the host aircraft. The speed of the target aircraft was 159 knots. This 
maneuver was performed for different vertical and horizontal separations. Images were recorded 
with the target aircraft 500 feet below and 500 feet above the host aircraft at distances of about 
1, 2, 3, 4 and 5 nautical miles. Recording ended when the target aircraft left the field of view of 
the camera. 

In the contraction maneuver (shown in Figure 2.3(b)), the target aircraft maintained a fixed 
position in the image surface as it flew away from the host aircraft. The target aircraft speed 
was 209 knots. Images were recorded with the target aircraft ascending at 500 feet per minute, 
descending at 500 feet per minute, or maintaining a fixed altitude. Recording ended when the 
target aircraft was about 5 miles from the host aircraft. The images from this sequence can be 



(a) (b) 

Figure 2.3: Two kinds of flight maneuvers, (a) Translating maneuver: the 
target aircraft crossed in front of the host aircraft with a vertical separation 
of 500 ft. (b) Contraction maneuver: the target aircraft maintained a fixed 
position in the image window by flying directly away from the host aircraft. 
The target aircraft maintained the same altitude as the host aircraft. (Not 
drawn to scale.) 
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played backwards to simulate the target aircraft motion that occurs with a collision. 

Two flight tests were conducted over a several day period in January and September, 1999. 
The first flight test focused on image capturing and recording, while the second flight test 
performed image capturing, recording, and processing concurrently. By recording images over a 
multiple day period in each flight test, a range of contrast conditions was obtained. In addition, 
the background of the target varied depending on its altitude. This approach provided a 
comprehensive set of images for testing the image processing algorithms under different 
conditions. 


2.4 Real-time image processing using MaxPCI image 
processing cards 

The MaxPCI is a real-time image processing card with pipelined processors 
manufactured by Datacube Inc. It uses a Windows NT workstation as a host machine and 
supports multiple simultaneous pipelines that can be switched by software at read-time frame 
rates. Each MaxPCI card consists of five modular hardware devices and a set of memories 
connected by a large programmable switch as shown in Figure 2.4. The first device is the 



Figure 2.4: Each MaxPCI is composed of a MaxAcq acquisition unit, 
a MaxVGA display unit, five Advanced Memorys (AM), two 
Arithmetic Units (AU), two Look-Up Tables (LUT) and two 
PSMOD add-on modules. 
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MaxAcq acquisition unit that receives either a digital or an analog signal from the camera. The 
second device is the Max VGA display unit that outputs the video signal to the Max VGA video 
card. The third device is the Arithmetic Unit (AU) which performs arithmetic and logical 
operations. The fourth device is the Look-Up Table (LUT) that performs pixel value 
transformations. The fifth device is the Advanced Memories (AM) component which can 
receive an image from the cross-point switch and transmit another image to the cross-point 
switch at the same time. The AM also allows the host computer to read or write pixels via the 
PCI bus. Moreover, each MaxPCI may be extended by the selection of two add-on processing 
and storage modules (PSMOD). The variety of the PSMODs enables users to balance their 
needs of processing, memory and resources. All of these devices operate on pixel arrays at 40 
MHz. The MaxPCIs communicate with each other through two buses called Maxbuses. Each 
Maxbus has a bandwidth of 160 MB/sec. The MaxVGA is a separate display card, which inputs 
images from the MaxPCI through a private MaxVGA bus. Hence, the display can be accelerated 
without interfering with the PCI bus traffic. ImageFlow programming allows the programmer to 
specify connections between the processing elements inside hardware devices, as well as 
between ports on the cross-point switch. It also provides access to attributes associated with 
each processing element. 


2.5 Obstacle detection algorithms 

Obstacle detection using image processing requires robust, reliable and fast techniques. 
These techniques should provide a high probability of detection while maintaining a low 
probability of false alarm in noisy, cluttered images of possible targets, exhibiting a wide range 
of complexities. The size of the image target can be quite small, from sub-pixel to a few pixels 
in size. Furthermore, the detection algorithm must report such targets in a timely fashion, 
imposing severe constraints on their execution time. Finally, the system must not only work 
well under the controlled conditions found in a laboratory and with data closely matching the 
hypothesis used in the design process, but it must be insensitive — i.e., must be robust — to data 
uncertainty due to various sources, including sensor noise, weather conditions, and cluttered 
backgrounds. 
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Over the past year, several algorithms were combined to form a composite system for 
detection of contracting targets [5]. The steps that form this composite system are: 

(1) Temporal Averaging: For objects in a uniform background, having a very small image 
motion, such as those on a collision or near-collision course. When the target motion is 
small, temporal averaging improves the SNR and reduces the processing rate required for 
subsequent steps. 

(2) Pyramid construction with low-stop or morphological filtering: This is a pre-processing 
step in which a pyramid is constructed to accommodate different sizes and velocities of 
objects. Low-stop filtering [6] is performed at each pyramid level to remove background 
intensity. There is an option to perform morphological filtering [2] in place of low-stop 
filtering at every pyramid level. This can be done when the background contains clutter 
due to clouds and/or ground to improve performance. 

(3) Dynamic Programming: A dynamic programming algorithm [3] is performed on pre- 
processed frames to integrate the signal over a number of frames by taking the target 
motion into consideration. 

It should be noted that one or more of these steps can be bypassed so that any of the basic 
algorithms described above can be tested individually using the same system. The above target 
detection algorithms were implemented on the Datacube MaxPCI system. 

In addition to detection of objects on a collision course, it is useful to monitor the objects 
that are crossing in front of the aircraft. For this purpose, a system was designed to specifically 
detect objects which have a translating motion in the image. To distinguish translating objects 
from ground or cloud clutter, the following criteria was used: (1) The object should have 
sufficient signal strength. (2) The object should have an image velocity greater than a threshold. 
(3) The object should have a consistent motion. 

The algorithm to detect translating objects has also been implemented on the Datacube 
MaxPCI system to obtain real time performance. The system was mounted on the host flight 
aircraft and performed well in detecting and tracking objects. The algorithm is divided into two 
concurrent parts. These parts are: (1) Image processing steps which remove most of the clutter, 
and isolate potential features which could be translating objects. These steps consist of temporal 
differencing, low-stop filtering, non-maximum suppressing (NMS) and feature extraction. 
These image operations are suitable for a pipelined architecture, and can be done in integer 
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format. Hence, these steps are implemented on the Datacube MaxPCI machine. The output of 
this part is a list of image points which are likely to contain the target objects, along with their 
signal strengths. (2) Tracking these features using a Kalman filter to distinguish genuine 
translating objects from background clutter which was not separated by the simple image 
processing steps of the first part. Since the first part has reduced the volume of data to be 
operated on, more complicated target tracking algorithms can be implemented even on the host 
PC associated with Datacube. The threshold used in the first part is adjusted dynamically to give 
a nearly constant number of features for the second part so that they can be processed in real 
time using the slower host. This is known as the rate constraint [4]. 


2.6 Backup from the new technology disk to a high 
capacity tape driver 

The total capacity of the NTD is 64 GB, which allows 36 minutes of real-time recording 
of IK x IK images at 30 frames per second. This capacity is sufficient for the capturing test of 
each flight. However, after each flight, it is necessary to backup the contents of the NTD, so the 
NTD space could be used again for the next flight. For this, the NTD was temporarily removed 
from the aircraft and directly connected via fibre channel to another Windows NT server with a 
64 GB local hard disk array and a Seagate Sidewinder 200 high capacity tape drive with 
autoloader ability. The backup process can be divided into two steps. The first step is to copy 
the contents of the NTD into the hard disk of the server. Because there is no formatting of data 
storage on the NTD, the Windows NT cannot access the NTD directly. Therefore, a C-callable 
low-level library, called NtdLib, is used to access the NTD. The second step is to backup the 
data from the local hard disk to the high capacity tape drive. The Seagate Sidewinder 200 is a 
high-capacity tape autoloader that combines Advanced Intelligent Tape (AIT) and autoloader 
technology to provide data storage for high-end computer systems. The autoloader technology 
enables the loading of four cartridges at the same time and it can exchange the cartridges 
automatically. The native (uncompressed) capacity of each cartridge is 25 GB, and three 
cartridges were required for each NTD backup. The speed of the first step is limited by the 
throughput of the local hard disk, while the speed of the second step is limited by the tape drive. 
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It was observed that the first step takes about three hours to complete while the second step takes 
another eight hours. Since the NTD is not needed in the second step, it is available for the next 
capturing task after the first step. Moreover, since both steps can be scheduled to execute 
automatically, no human interaction is required during the whole process of the backup. 
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Chapter 3 

Implementation of Obstacle 
Detection Algorithms on MaxPCI 

3.1 Available resources 

Our Datacube IP system is equipped with two MaxPCI IP cards. Each card has multiple 
pipelined processors, memory and other resource. Up to two PSMODs can be installed on each 
MaxPCI. The first MaxPCI in the test flight system was equipped with a Storage96 (ST) and a 
General Purpose (GP) PSMOD, while the second MaxPCI was equipped with a GP and a 
Convolver200 PSMOD. Table 3.1 lists the main resources in the first MaxPCI card. Table 3.2 
lists the main resources in the second MaxPCI card. It should be noted that ST is the same as 
AM while GP is similar to AU. The only difference between AU and GP is that each AU is 


Table 3.1: The number of main resources in the first MaxPCI card. 


MaxPCI #0 Resource 

Abbreviation 

Amount (ID) 

Arithmetic Unit 

AU 

2(0-1) 

Arithmetic Memory 

AM 

5 (0-4) 

Look-Up Table 

LUT 

2(0-1) 

Delay Element 

DLY 

2(0-1) 

General Purpose add-on PSMOD 

GP 

4 (0-3) 

Storage96 add-on PSMOD 

ST 

6 (0-5) 

Analog Acquisition Module 

QA 

1 


12 



independent of one another while GPs are hardwired to be paired sequentially. This implies that 
once GPO is used, the output of GPO will feed into GP1 automatically. Although GP1 can be 
configured to do nothing but only pass through the input data, GP1 will be occupied and cannot 
be used for any other purpose to avoid resource conflict. Another important constraint is the 
communication channels (CH) between the two MaxPCI cards. There are only eight CH 
channels, with eight bits in each channel. Thus, the traffic between two MaxPCIs should be 
minimized to preserve the precious CHs. The first MaxPCI was equipped with QA acquisition 
module to input an analog signal, while the second MaxPCI was equipped with a QI acquisition 
module to input a digital signal from a camera. 


Table 3.2: The number of main resources in the second MaxPCI card. 


MaxPCI #1 Resource 

Abbreviation 

Amount (ID) 

Arithmetic Unit i 

AU 

2(0-1) 

Arithmetic Memory 

AM 

5 (0-4) 

Look-Up Table 

LUT 

2(0-1) 

Delay Element 

DLY 

2(0-1) 

Convolver200 add-on PSMOD 

VD 

1 

General Purpose add-on PSMOD 

GP 

4(0-3) 

Digital Acquisition Module 

Qi 

1 


3.2 Pipelined cost model 


Experiments were performed on the MaxPCI system to analyze the delay of ImageFlow 
pipes. The result shows that the total delay of processing is equal to the sum of each serial 
pipe’s delay, while each serial pipe’s delay is proportional to the number of input pixels. It is 
concluded that the delays of each schedule mainly depends on its input image size and the 
number of sequential pipes. The following cost function for processing delay is obtained from 


13 

















the experiments. Suppose a subtask T k is divided into n sequential pipes: 
{pipe k i , pipe k 2 , • ■ • , pipe kn } , then 


Delays(T k ) = ]jT Delay ( pipe ki ) , for 1 < k < C PS 


<=i 


Delay(pipe kj ) = C ki + (T CC x « ki ) + (f cc x sj 

Delays{T k ) is the total delay of the subtask T k . 

C PS is the number of sequential pipes. 

Delay{pipe ki ) is the delay of the sequential pipe pipe ki . 
where •{ C ki is the configuration time of pipe kj . 

T cc is the clock cycle time. 

n kj is the number of gates in pipe kr (the complexity of the pipe) 
[s i( . is the size of input image of pipe kr 


3.3 Pipelined scheduling 

The concurrency can be either spatial concurrency or temporal concurrency. The spatial 
concurrency (parallelism) means that tasks can be executed by several processors 
simultaneously, while the temporal concurrency (pipelining) means that chains of tasks can be 
divided into stages, with every stage handling results obtained from the previous stage. The 
temporal concurrency can be exploited to increase the throughput whenever a long sequence of 
image processing tasks is applied on a continuous image sequence. 

Consider an example shown in Figure 3.1. For simplicity, suppose each task has the 
same computation cost (40 ms). On a single processor case shown in Figure 3.1(a), the total 
iteration period is 200 ms (5 FPS). For scheduling techniques that only consider parallelism 
(shown in Figure 3.1(b)), only the three middle tasks can be sped up, thus the critical path limits 
the iteration period to 120 ms (8.3 FPS). For scheduling techniques that only consider 
pipelining (shown in Figure 3.1(c)), the middle stage becomes the bottleneck, limiting the 
iteration period to be 120 ms (8.3 FPS). In fact, the task graph has both spatial and temporal 
concurrency. To maximize the throughput, both types should be exploited. Figure 3.1(d) shows 
the scheduling of three pipelined stages, with the second stage having three tasks executed in 
parallel. If five processors are available, this partition is optimal, resulting in an iteration period 
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(a) Single processor (b) Pure Parallel Scheduling 



Iteration period = 1 20ms Iteration period = 40ms 

Throughput = 8.3FPS Throughput =25FPS 


(c) Pure Pipelined Scheduling (d) Parallel & Pipelined Scheduling 
Figure 3.1: Parallel and Pipelined scheduling 



(a) Spatial (b) Temporal (c) Three pipes with both 

Pipelining Pipelining spatial and temporal pipelining 


Figure 3.2: Spatial and Temporal pipelining 

of 40 ms (25 FPS). Therefore, an ideal scheduling strategy should exploit both spatial and 
temporal concurrency present in the task graph. 

Moreover, there are two levels of pipelined scheduling. The first level is the pixel level 
pipelining, or the spatial pipelining. The spatial pipelining means that an image is fed into a 
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processing pipe pixel by pixel. Suppose there is a processing pipe using three processors, like 
figure 3.2(a). Whenever a new pixel is fed into pi, an old pixel will be fed from pi to p2, so 
that p2 can process the old pixel while pi processes the new pixel. Since the MaxPCI are 
designed with pipelined processors, spatial pipelining is ready as soon as we declare, configure 
and fire a processing pipe. The second level is the frame level pipelining, or the temporal 
pipelining. The temporal pipelining means that continuous image frames are fed into several 
processing pipes frame by frame. Suppose there is a processing pipe requiring nine processors. 
We can divide the pipe into three different pipes, each pipe with three processors. Then image 
frames can be fed into these pipes in a pipelined way, like Figure 3.2(b), so that three pipes can 
be executed concurrently. Both spatial and temporal pipelining can be exploited at the same 
time in this example. Figure 3.2(c) shows that three pipes execute concurrently (frame level 
pipelining), while three processors work concurrently in each pipe (pixel level pipelining). 

In order to use the spatial pipelining, a processing pipe cannot flow through the same 
resource more than once. The resource includes processor (AU), memory (AM), look-up table 
(LUT) and MaxPCI channels (CH). In order to use the temporal pipelining, more care should be 
taken to avoid resource conflict. Since several pipes execute concurrently at the same time by 
temporal pipelining, none of these pipes can share the same resource. Moreover, the number of 
concurrent pipes is also limited because there are only six internal clocks in each MaxPCI. 

The design of pipeline scheduling can be divided into three steps. The first step is to 
partition the dependency graph into several pipeline stages with minimum cut value. The second 
step is to allocate resources for each pipeline stage. The final step is to schedule the operations 
inside each pipe stage using allocated resources. Currently, all these steps are implemented and 
optimized manually by performing an exhaustive search on all possible schedules, and 
calculating the processing delay of each schedule using the cost function. The schedule with 
minimum processing delay is the best schedule. However, this brute-force method is time- 
consuming and impractical for large graphs. A scheduling heuristic should be developed to 
approximate the optimal schedule according to the cost function in acceptable time. 
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3.4 Basic concept of MaxPCI programming 

There are two ways to program the Datacube MaxPCI hardware. ImageFlow 0 is a low- 
level library of C-callable functions that configure and manage data transfers on the Datacube 
MaxPCI pipeline processing devices. ImageFlow allows the programmer to specify connections 
between the processing elements inside hardware devices, as well as between ports on the 
crosspoint switch. It also provides access to attributes associated with each processing element. 
Programmers cannot simply state that two image streams are to be added; they must specify 
which ALU is to be used, where in memory the images are stored, and the path the images will 
take to reach the ALU. It’s a programmer’s job to handle resource conflicts. On the other hand, 
Datacube Wit is a high level package that allows image processing computations to be described 
in terms of coarse-grained dataflow graphs. With the Datacube Wit, programmers no longer 
have to specify how images will be stored in memory, what paths they will take through the 
switch, which computational elements will perform a specific function, or how many 
computational resources there are. If there is any resource conflict in the graphs, the Datacube 
Wit scheduler detects the conflict automatically and schedules the jobs sequentially to solve the 
conflict. However, Datacube Wit generalizes the resources mapping and cannot take full 
advantage of the MaxPCI hardware. Moreover, Datacube Wit cannot support system with 
multiple MaxPCI cards and is not a mature product at this time. In our project, ImageFlow was 
used to develop optimized programs of all the obstacle detection algorithms. 

An algorithm should be defined as multiple parallel pipes to accomplish the desired tasks 
efficiently. These algorithms are then mapped to a sequence of pipeline processing elements. 
The basic steps 0 to program each ImageFlow pipe are explained as follow. 

Attaching Surfaces 

The source memory buffers (called surfaces) at the beginning of a pipe are attached to 
the transmission gateway. The destination surfaces at the end of a pipe are attached to the 
receiving gateway. Gateways are always connected to data surface stores. While there can be 
multiple data surface objects on a single surface store, there can only be one data surface object 
attached to a gateway at a time. Attaching the particular data surface to the gateway makes 
explicit which data surface object will participate in the pipe. 

Connecting Pipes 
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After the pipeline processing elements are defined, processing pipelines are built by 
setting programmable switches, routing the data through the appropriate sequence of elements, 
and tying multiple elements together into a processing pipeline. The attributes of each pipelined 
processing element are set so that the element performs the desired processing operations. 

Creating Pipes 

Pipes should only be created after the pipe has been connected, its elements’ 
programmable attributes have been set, and gateways have been attached to all its surfaces. The 
pipe creation function takes the destination surface objects as its first argument, which can be 
either single destination or multiple destinations. The function’s second argument is its trigger 
type. This can be “single shot” which transfers a single frame of image data, or “continuous” 
which transfers image data continuously. 

Arming Pipes 

Arming the pipe is performed after the pipe set-up work is complete. The arm command 
initiates PC ImageFlow tracing back through all the connections and attributes that were set to 
determine the exact organization and structure of the pipe. PC ImageFlow then calculates all the 
correct register loads for configuring the IP system hardware to match your software settings, 
and finally makes the register loads. After arming, the pipeline has been constructed, but data 
has not begun to flow. 

Firing Pipes 

The actual image processing tasks are performed by firing data through each of these pipes. 
The pipes actually start the data transfer from the source surfaces to the destination surfaces. 
Data can be fired though the pipe as either a single shot or a continuous image sequences. 

Every ImageFlow program should have at least three pipes, the acquisition, the 
processing and the display pipe. The acquisition pipe obtains image sequences from the camera 
while the display pipe offers a stable output for the monitor. In many applications, it’s too 
complicate to handle the whole processing in one pipe, thus, the processing is usually partitioned 
into many pipes. 
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3.5 Advanced features of MaxPCI programming 


3.5.1 Pipe altering thread 

One of the most important features of ImageFlow programming is the Pipe Altering 
Thread (PAT). The use of PAT can reduce re-arm time for a pipe, and is vital to efficient 
applications. When the pipe is armed, all the delays and configuration information will be 
calculated and this consumes a significant part of the execution time. With a PAT, these steps 
are performed in advance and the results of the calculation are stored, and then loaded when the 
pipe is armed. Of course, the loading time is very fast compared to the time for all the 
calculations of the arming operation, and that is the advantage of the PAT. PAT provides an 
option to speed up the image processing by pre-calculating the pipe delay and parameter setting. 
However, it also increases the complexity of the ImageFlow programming. 


3.5.2 Double-buffering 

Double-buffering can be helpful when consecutive continuous pipe transfers run at 
different frame rates. For example, our camera has an acquisition rate of 30 frames per second 
and the MaxPCI has a processing rate of 40 frames per second. With mismatched frame rates 
like these, the read can outrun the write, producing read-crossing-write errors that result in splits, 
or jumps in moving objects in the processed image. To eliminate problems like this, and to 
improve the overall system performance, we may consider using double-buffering. In double- 
buffering, separate buffers for read and write operations are created on a single memory surface. 
While one frame is being written to the surface, the previous frame can be read out. The read 
must wait for the completion of the write. Then the buffers are swapped, the write buffer 
becomes the read buffer, and vice versa. The double-buffering technique is used in our 
implementation whenever the MaxPCI records to the NTD, playbacks from the NTD, and 
acquires from the camera. 
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3.5.3 High speed image access 

High Speed Image Access (HSIA) is a technique that moves data directly back and forth 
between the disk controller card and the MaxPCI’s memory, without being copied in an 
intermediate memory buffer. This eliminates the copying of data to host memory as it would be 
required by other disk storage products. The HSIA port on the MaxPCI offers good 
performance in accessing image data via the PCI bus. HSIA also supports image data transfer 
between the host memory and the MaxPCI’s memory. An ImageFlow data transfer pipe can be 
configured to perform eight, sixteen, or thirty-two bit HSIA transfers using one, two, or four 
image memories in MaxPCI. HSIA enables data transfer pipes to be declared inside a PAT, thus 
increasing the transmission speed and programming flexibility. 


3.6 Detection of translating targets 

The implementation of detection for translating targets can be divided into three 
subsystems: the record/playback subsystem, the image processing subsystem, and the tracking 
subsystem. The first and second subsystems are implemented on Datacube MaxPCI cards, while 
the third subsystem is implemented on the host CPU. All three subsystems should be executed 
in parallel to make the whole system run as fast as possible. Hence, care should be taken to 
avoid any resource conflict and reduce the communication between subsystems. 


3.6.1 Recording/Playback subsystem 

Both the recording and playback portions of the record/playback subsystems need to 
handle double-buffering, because the NTD operates at a faster rate than the MaxPCI. 

Recording 

Real-time image recording is also an important issue in our project. The Kodak camera 
generates data on two channels. One channel contains odd rows, while the other channel 
contains even rows. The NTD can also read from two buffers. However, the NTD expects that 
one buffer contains odd columns, while the other buffer contains even columns. In order to 
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solve this conflict, two channels acquired from the camera should be merged into one lkxlk 
image using PA_G_BOOL, then be separated into two buffers vertically (shown in Figure 3.3). 
PA_G_BOOL is a small hardware element that can merge two images using a simple logical 
operation. The merged image is also useful for display purposes. The HSIA port on the 
MaxPCI offers good performance in accessing image data via the PCI bus. 



(a)Algorithm (b) Implementation 

Figure 3.3: NTD Recording. 



Odd Columns Even Columns 


(a)Algorithm (b) Implementation 

Figure 3.4: NTD Playback. 
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Playback 


Real-time playback from the NTD is useful for algorithm testing and development. 
Basically, the implementation of playback is similar to recording, with a reverse order of 
pipelined connections (shown in Figure 3.4). In fact, the implementation of playback is simpler 
because the camera is no longer necessary, and we can skip the acquisition pipe. 


3.6.2 Image processing subsystem for translating targets 

This subsystem performs the basic image processing steps to suppress clutter and extract 
features which could potentially be translating targets. The image processing subsystem is the 
most complicate subsystem, and occupies the most resources. Figure A. 5 shows an overview of 
the image processing subsystem. 

Temporal differencing 

Image differencing was performed on the low-stop filtered images by subtracting 
consecutive frames. This is low-stop filtering in temporal direction. Since the object was 
assumed to be translating, image differencing would suppress stationary objects corresponding to 
background clutter. It should be noted that the first three steps are theoretically interchangable, 
since they are all linear filters. However, since these operations were performed with integer 
arithmetic of limited precision, care was taken to reduce the truncation error to a minimum. The 
Aml_2 and Aml_3 were used as a temporal buffer to store the last frame and the frame before 
last frame (shown in Figure 3.5). The difference image was obtained by subtracting the frame 
before the last frame from the current frame. 

Sub-sampling 

Sub-sampling was used to divide the image size by a factor of two, so that the system 
was capable of execution in real time. Low-pass filter was performed before down-sampling to 
reduce the loss of sub-pixel information. This step reduced the resolution of the image by two. 
However, since the target size was usually greater than two pixels, this step actually enhanced 
the signal to noise ratio due to the spatial integration performed by the low-pass filter. A 5x5 
Gaussian lowpass filter was implemented using the VD_NMAC_A in the Convolver200 shown 
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in Figure 3.6. Sub-sampling was performed by adjusting both horizontal and vertical zoom 
factors of the receiving gateway in the AM1_4. 


Odd Columns Even Columns 



Gpljl Subtraction 

i 

(a)Algorithm (b) Implementation 

Figure 3.5: Temporal Differencing. 
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(a)Algorithm (b) Implementation 

Figure 3.6. Sub-sampling. 
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Low-stop filtering 

Low-stop filter was used to suppress the background clutter. The filter was formed by 
subtracting a low-pass filter of large size from a low-pass filter of a much smaller size. The 
filter mask used is shown in Figure 3.7. The filter effectively subtracts from every pixel the 
mean of its neighborhood, thus suppressing uniform background intensity and weak clutter. 
Since a half set of Convolver200 resources was occupied by the step of sub-sampling, only 100 
points of Convolver200 (VD_NMAC_C and VD_NMAC_D) can be used in this step. In order 
to use these 100 points efficiently, two sequential 7x7 low-stop filters were used to simulate a 
big 13x13 kernel low-stop filter. 

Non-maximum-suppressing 

Non-maximum-suppressing (NMS) was used to extract the local maximum. If the 
magnitude of a pixel was not greater than all of the neighbor magnitudes, then the magnitude of 
the pixel was set to zero. After applying NMS, the number of final features can be reduced and 
the overall throughput can be increased. A 3x3 NMS was implemented by subtracting the 
original image from the image after 3x3 dilation. It is required to detect both positive and 
negative targets. However, two sets of 3x3 dilation would consume a lot of resources, especially 
AU. In order to save precious resources, an absolute operation was performed before NMS, so 
that only one set of dilation was required (shown in Figure 3.8). 

Histogram accumulation and automatic threshold selection 

To extract candidate features, the image obtained from the above steps should be 
thresholded. Furthermore, the threshold should be chosen so that the number of features neither 
overloads the tracking subsystem, nor keeps it unnecessarily idle. Hence, the threshold was 
selected so that the number of pixels exceeding the threshold was less than a fixed rate that 
matches the operation speed of the tracking subsystem. For this purpose, a histogram of the 
image was used. The HP in MaxPCl can generate a general histogram. The histogram consists 
of 256 bins. Each bin occupies two bytes. Then the histogram was accumulated using the 
AU0_1 to find out the threshold value. The LUT0_0 was configured according the automatic 
selected threshold value. Finally, pixels in the image with the amplitude value smaller than the 
threshold value were suppressed using the LUT0_0 (shown in Figure 3.9). 
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(a)Algorithm (b) Implementation 

Figure 3.7: Low-stop filter. 



(a)Algorithm (b) Implementation 

Figure 3.8: Non-maximum suppressing. 
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Feature extraction 

The pixels passing the thresholding were extracted as features. The position and 
amplitude of each feature were transmitted to the tracking subsystem. The information of 
features can be extracted using the statistic module inside AM shown in Figure 3.10. Feature 
coordinates were extracted using AM0_2, while feature amplitudes were extracted using 



ABS NMS 

"~T 

Thresholding j Thresholding LUT 
THRESH 


(a)Algorithm 



(b) Implementation 


Figure 3.9. Histogram accumulation and automatic threshold selection. 



(a)Algorithm (b) Implementation 

Figure 3.10: Feature Extraction. 
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AM0_3. Since the threshold value in the last step was selected so that the number of pixels 
exceeding the threshold is less than a fixed rate, the number of features was guaranteed to fit into 
the memory surface safely. Since the tracking subsystem was implemented using the host CPU, 
the features were transferred from the MaxPCl to the host memory so the tracking subsystem 
can read from the host memory directly. The technique of HSIA was used to efficiently perform 
the feature transmission through the PCI bus. 


3.6.3 Tracking subsystem 

This subsystem maintained a list of tracks containing the frame number, a unique ID, 
position, velocity, and amplitude. The list was empty in the beginning. The following steps 
were repeated for every frame: 

(1) Input: The list of features was received from the image processing subsystem. 

(2) Track update: For each track in the list of tracks, the list of features was scanned to obtain 
features in a neighborhood window around the track’s position. If there were any such features, 
the strongest of these features was selected as the continuation of the track. Using the 
coordinates of this feature, as well as the current track position and velocity, the expected 
position and velocity for the next frame were estimated using a Kalman filter. If no such feature 
was found in the neighborhood of the track, the position and velocity were extrapolated in the 
same Kalman filter framework, using only the current values for the track. The track amplitude 
was updated using recursive averaging with a forgetting factor: 

F n + \= F „ +aF „ 

where F n and F n+] are the track amplitudes for the current and next frames, 

F n is the feature amplitude, and 
a is the forgetting factor. 

The feature amplitude F n is zero if no feature is found. 

(3) Formation of new tracks: After all the current tracks were updated, features in the feature 

list were used to check for new tracks. For every feature, the list of tracks was scanned to see if 
a track was already there in its neighborhood. If not, a track was created out of the feature. Its 
position should be the same as the feature position, whereas velocity was initialized to zero. The 
actual velocity was computed only in the next frame. 
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(4) Pruning the list of tracks: If the number of tracks was too large, the subsystem can get 
overloaded and fail to operate in real time. To eliminate this possibility, if the number of tracks 
were greater than a particular number, the weakest tracks were deleted. 

(5) Merging similar tracks: It may happen that two or more tracks may be formed 
corresponding to the same object. Hence, tracks that were very close to each other and had 
nearly the same velocity were merged, retaining the one with larger amplitude. 

(6) Output: Tracks which satisfy the criteria of the object including amplitude larger than a 
threshold, and other factors were output as potential objects. 

Table 3.3 summarizes the required resources and execution throughput for the operations 
described in this section. The required resources are based on the implementations on the 
Datacube MaxPCI system. The reported execution throughput is based on an input image with 
lkxlk resolution. The only exception is the histogram accumulation operation that uses a 256 
bin (512 byte) histogram as the input. Therefore histogram accumulation executes much faster 
than all the other operations. 

Table 3.3: The required resources and execution throughput for operations 
implemented on the Datacube MaxPCI system. 


Operations 
(input size) 

AM 

AU 

DLY 

CH 

LUT 

VD 

Response 
time (ms) 

NTD Record/Playback (lkxlk) 

2 

0 

0 

1 

0 

0 

24.9 

Camera Acquire (lkxlk) 

2 

0 

0 

0 

0 

0 

34.1 

Temporal Differencing (lkxlk) 

H 

2 

0 

1 

0 

0 

29.7 

Non-maximum suppression 
(lkxlk) 


5 

3 

3 

2 

0 

29.7 

Histogram Accumulation 
(256 bins) 

i 

1 

0 

0 

1 

0 

0.3 

Feature Extraction (lkxlk) 

2 

0 

0 

0 

1 

0 

29.7 
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3.7 Detection of contracting targets 

The implementation of the detection algorithm for contracting targets can be divided into 
three subsystems: the record/playback subsystem, image processing subsystem, and tracking 
subsystem. The record/playback subsystem is exactly the same as the record/playback 
subsystem for translating targets (explained in previous section). The tracking subsystem for 
contracting targets is not implemented yet, though it is similar to the tracking subsystem for 
translating targets. The image processing subsystem is explained in the following section. 


3.7.1 Image processing subsystem for contracting targets 

This subsystem performs the basic image processing steps to suppress clutter and extract 
features that could potentially be contracting targets. Two algorithms were designed for the 
detection of contracting targets. The first algorithm performs a lowstop filter followed by six 
dynamic programming operations (shown in figure A2), while the second algorithm performs 
three morphological filter operations followed by six dynamic programming operations (shown 
in figure A3). Both algorithms use a pyramid construction operation as a pre-processing step. 
The algorithms can be separated into several individual operations. The individual operations 
were implemented successfully on the Datacube MaxPCI system, however, extended work is 
still required to arrange and connect individual operations together into a complete subsystem 
for contracting targets. 

Pyramid construction 

To detect targets of a number of different sizes and velocities, spatial integration can be 
performed by forming an image pyramid. A hierarchy of images, each with half the resolution 
of the previous one was formed. A low-pass Gaussian filter was performed before down- 
sampling to reduce the loss of sub-pixel information. A three-level pyramid construction can be 
divided into three sequential pipes (shown in Figure 3.1 1). The first pipe smoothed and sub- 
sampled the original lkxlk image into a 512x512 image, and copied the Ikxlk image to the 
destination memory surface. The second pipe smoothed and sub-sampled a 512x512 image into 
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a 256x256 image, and copied the 512x512 image to the destination memory surface. Finally, the 
third pipe copied the 256x256 image to the destination image. It should be noted that these three 
pipes cannot be executed concurrently because they need to write into the same destination 
memory surface (Aml_3). 




Ikxlk 


Am1_3 Ikxlk 


Ami 2 


Lowpass filter 
Sub-sampling by 2 

512x512 




256x256 


(a)Algorithm 

Figure 3. 


(b) Implementation 
Pyramid Construction. 
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(a)Algorithm (b) Implementation 

Figure 3.12: Low-stop filter. 
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Lowstop filter 

A low-stop filter was used to suppress the background clutter. The filter was formed by 
subtracting a low-pass filter of large size from a low-pass filter of a much smaller size. The 
filter effectively subtracts from every pixel the mean of its neighborhood, thus suppressing 
uniform background intensity and weak clutter. Since the VD_NMAC_A in the Convolver200 
was occupied by the step of pyramid construction, the VD_NMAC_C was used in this step. The 
filter mask used is shown in Figure 3.12. 

Morphological filter 

In the case of an image with little or no clutter, a low-stop can be used. However, if the 
background has significant clutter, the low-stop filter is not as effective in removing it. A 
morphological filter [2] can remove large sized features (usually clutter), while retaining small 
sized features (usually targets). A difference between the original image and its morphological 
opening (top-hat transform) outputs small-sized positive targets — i.e., bright targets in dark 
background — whereas the difference between the morphological closing and the original image 
(bottom-hat transform) outputs negative targets, i.e., dark targets in bright background. Each of 
these images can be separately used to detect targets. 

The main operation inside a morphological filter is either a dilation followed by an 
erosion (closing) or an erosion followed by a dilation (opening). A dilation (erosion) can be 
done by configuring an arithmetic unit (AU) to perform a maximum (minimum) operation. 
Delay elements (DLYO, DLY1, DLY2, and DLY3) were included in the pipe to adjust the 
alignment properly (shown in Figure 3.13). 

Temporal averaging 

To decrease the probabilities of false alarms and missed detections, one can integrate 
observations spatially or temporally. In case of stationary targets, optimal detection can be 
achieved by adding (or averaging) the images in the sequence, and thresholding the output. A 
recursive filter was used with a forgetting factor ‘a’ between 0 (full forgetting) and 1 (no 
forgetting). The output F(k) at time k is given in terms of the input f(k) as: 
F(0) = 0, F(k) = (\-a)f(k) + aF(k~\) 
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(a)Algorithm 


(b) Implementation 
Figure 3.13: Morphological Filter. 
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(a)Algorithm (b) Implementation 

Figure 3.14: Temporal Averaging. 

In order to reduce the truncation error to a minimum, a 16 bit averaging operation was 
desirable rather than an 8 bit averaging operation. An arithmetic unit (AU) was configured to 
perform a weighted 16 bit averaging. The Am0__0 was set to stored the high byte, while the 
AmO_l stored the low byte of the averaging output (shown in Figure 3.14). 
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Dynamic programming 

In case of moving targets, the temporal averaging filter does not improve the detection. 
A dynamic programming algorithm [3] is more effective in detection of moving targets. The 
algorithm is based on shifting the images before averaging them so as to align the target to be 
detected. Since the velocity of the target could be arbitrary, the velocity space (u,v) is 
discretized within the range of possible target velocities, and for each discrete velocity on the 
grid, an intermediate image F is created recursively using: 

F(x,y;u,v; 0) = 0 

F(x, v;u,v;k) = f(x, y;k) + a max F(x , -u,y-v\u,v\k- 1) 

(x\y')zQ(.x.v) 

Finally, a maximum operation is performed at time N, when the result is to be reported: 

F m (x, y; K) = max F(x, y; u, v;K) 

Om) 

Each dynamic programming consists of four pipes. Each pipe is a recursive temporal 
averaging with an additional dilation (maximum) in the loop (shown in Figure 3.15). The 
Gp0_0 was configured to perform the weighted averaging, while the GpO_l was configured to 
perform a maximum operation with four inputs. The delay elements (DLYO, DLY1) were 
inserted in the proper position to achieve the correct alignments for the four inputs of the 
maximum 



(a)Algorithm 


(b) Implementation 


Figure 3.15: Dynamic Programming. 
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Table 3.4 summarizes the required resources and execution throughput for the operations 
described in this section. The required resources are based on the implementations on the 
Datacube MaxPCI system. The reported execution throughput is based on an input image with 
lkxlk resolution. A smaller input size should result in a larger throughput. 


Table 3.4: The required resources and execution throughput for operations 
implemented on the Datacube MaxPCI system. 


Operations 
(input size) 

AM 

AU 

DLY 

CH 

LUT 

VD 

Response 
time (ms) 

Pyramid Construction (3 levels: 
lkxlk, 512x512, 256x256) 

2 

0 

1 


1 

1 

39.2 

Lowstop Filter (lkxlk) 

■ 

■ 



■ 

■ 

29.7 

Morphological Filter ( 1 kx 1 k,both 
positive and negative, 3x3 kernel) 

2 

10 

1 


0 

0 

29.7 

Temporal Averaging (lkxlk) 

3 

1 

0 

1 

0 

0 

29.7 

Dynamic Programming (lkxlk) 

5 

9 

4 

7 

0 

0 

29.7 


3.8 Results of the implementation on MaxPCI 

Figure 3.16 shows a trace of the tracking algorithm applied on an image sequence with the 
target aircraft translating from the right to the left side of the image. A detection is shown by 
drawing a small black square around the detected position. The distance between the host and 
target aircraft is 3 nautical miles in this scenario. 
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Figure 3.16: Tracking algorithm applied on an image sequence with the target 
aircraft translating from the right to the left side of the image at a distance of three 
nautical miles. The target aircraft is located at the end of the track in this image. 

Table 3.5 summarizes the performance of the translating target tracking algorithm for a 
number of distances between host and target aircraft. The false alarm (FA) rate is the ratio of 
the total number of false alarms throughout the sequence to the number of image frames in the 
sequence. The miss detection (MD) rate is the ratio of the number of frames in which the target 
was missed to the total number of frames. The false alarm rate depends on the amount of clutter 
in the images, whereas the miss detection rate depends on the target size and contrast, and 
therefore increases with the target distance in most cases. Since false alarms can be very 
annoying to pilots, a low false alarm rate was more desirable than a low miss detection rate. 
Hence, the parameters of the tracking algorithm were selected deliberately to reduce the false 
alarm rate. The tracking parameters are the same for all scenarios. It is possible to get better 
performance by adjusting parameters individually according to the characteristics (such as the 
clutter level) of each scenario. 
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Table 3.5. The performance of the translating target tracking algorithm for a number of target 
distances in nautical miles (nmi). The false alarm (FA) rate is measured as the ratio of the total 
number of false alarms throughout the sequence to the number of image frames in the sequence. 
The miss detection (MD) rate is measured as the ratio of the number of frames in which the 
target was missed to the total number of frames. The algorithm was executed at 1 5 FPS on the 
Datacube MaxPCI system. 


Description 

No. 

Target 

Brightness 

Target 

Direction 

Distance 

(nmi) 

Ground 

Clutter 

Cloud 

Clutter 

MD 

FA 

Target 500 
feet below 
TIFS 

1 

Negative 

R to L 

1.5 



0.061 

0.000 

2 

Negative 

L to R 

1.8 



0.113 

0.000 

3 

Negative 

R to L 

3.0 

* 


0.056 

0.000 

4 

Positive 

R to L 

4.7 

** 


0.363 

0.180 

5 

Positive 

R to L 

5.0 

** 


0.803 

0.147 

Target 500 
feet above 
TIFS 

6 

Negative 

R to L 

1.5 



0.061 

0.000 

7 

Negative 

R to L 

j_ ! 

2.0 



0.092 

0.000 

8 

Negative 

i 

R to L 

3.0 


* 

0.057 

0.000 

9 

Negative 

R to L 

4.7 

** 


0.335 

0.183 

Random 

traffic 

encounter 

10 

Negative 

R to L 

1.2 


** 

0.159 

0.063 

11 

Positive 

L to R 

2.4 



0.059 

0.000 

12 

Negative 

R to L 

3.0 

* 


0.053 

0.000 

13 

Negative 

R to L 

3.0 


* 

0.089 

0.000 

14 

Negative 

L to R 

3.0 


** 

0.524 

0.386 

15 

Negative 

R to L 

5.0 

* 


0.192 

0.038 

16 

Negative 

R to L 

5.4 

* 

* 

0.643 

0.000 
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Chapter 4 

Implementation of Obstacle 
Detection Algorithms on MaxVideo 


Before the delivery of the MaxPCI system in October 1998, some parts of the obstacle 
detection algorithms were implemented on an old MaxVideo 200 system. MaxVideo 200 is an 
earlier Datacube model that works slower and has fewer hardware resources. Two examples are 
given in this section to explain the implementation on the MaxVideo system. 

No matter whether the MaxPCI or the MaxVideo is used, the process of Imageflow 
programming can be divided into four steps. The first step is to define the algorithms as 
multiple parallel pipes to accomplish the desired tasks efficiently. These algorithms are then 
mapped to a sequence of pipeline processing elements. After the pipeline processing elements 
are defined, processing pipelines are built by setting programmable switches, routing the data 
through the appropriate sequence of elements, and tying multiple elements together into a 
processing pipeline. Third, the attributes of each pipelined processing element are set so that the 
elements perform the desired processing operations. Finally, the actual image processing tasks 
are performed by firing data through each of these pipes. Data can be fired though the pipe as 
either a single shot or a continuous sequence of images. 


4. 1 Differences between the old MaxVideo system and 
the current MaxPCI system 

There are several improvements for the current MaxPCI system over the old MaxVideo 
system (Table 4.1). First, the clock rate of the new system increases from 20 MHz to 40 MHz. 
This means the data in the pipes can be fired two times faster than the current system. Second, 
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Table 4. 1 : Differences between the old MaxVideo system and the current MaxPCIs 
system. 



Old System 

Current System 

Pipelined Accelerator 

MaxVideo 200 

MaxPCI x 2 

Clock Rate 

20MHz in General 
40MHz inside AM 

40MHz 

Signal Input (MaxAcq) 

Analog 

Digital or Analog 

Advanced Memory (AM) 

24MB 

32MB x 2 

Add-on Memory Module 

N/A 

16MB x 6 

(Storage 96 PSMOD) 

Arithmetic Unit (AU) 

1 

4 

Add-on Arithmetic Unit 

N/A 

4x2 

(General Purpose PSMOD) 

Add-on Convolver Unit 

8x8 Kernel 

200 Points Kernel 


(Advanced Processing Unit) 

(Convolver 200 PSMOD) 

Look-Up Table (LUT) 

8 bit LUT 

1 6 bit LUT x 2 

Clock Generator 

5 

Unlimited 

Bus Type 

VME 

PCI 

Host Machine 

Sun Sparc Workstation 

Pentium PC 


there are more Advanced Memory (AM) elements in the new system. The AM not only can be 
used as source and destination storage by the pipes, but they can also be used as intermediate 
storage inside long pipes. With more AM, the new system has the ability to build datapaths for 
more complex algorithms. Third, there are twelve Arithmetic Units (AU) in the new system. 
Compared to the current system which has only one AU, there will be fewer resource conflicts 
and we can fire more data pipes concurrently. Thus, the degree of parellelism can be increased 
and the frame rate can be improved. Fourth, the Analog Scanner (AS) is replaced by the 
MaxAcq acquisition unit, while the Analog Generator (AG) is replaced by the MaxVGA. 
MaxVGA, a separate display card, inputs images from the MaxPCI through a private MaxVGA 
bus, thus the display can be accelerated without interfering with the PCI bus traffic. Fifth, the 
functions of the Advanced Processing (AP) unit are replaced by the selection of two add-on 
Processing & Storage Modules (PSMOD). The variety of the PSMODs enables users to balance 
their needs of processing, memory and resources. Finally, the capacity of the convolver (CV) 
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and the look-up table (LUT) is also increased, giving us more flexibility to design and optimize 
the datapath for our applications. Generally speaking, the new MaxPCI system far surpasses the 
old MaxVideo system. Thus, a better frame rate can be obtained for the same obstacle detection 
algorithms on the current MaxPCI. 


4.2 Implementation of one branch of a morphological 
filter 

This section explains how to implement one branch of a morphological filter with a 2x2 
mask (Figure 4.1a) on the MaxVideo system. Generally speaking, one branch of a 
morphological filter consists of a minimum operation with four inputs followed by a maximum 



(a) the flowchart of a branch (b) hardware mapping 

in mophological filter of (a) 



(c) the configuration of AU-Nonlinear performing 
maximum operations. This is a part of pipe 1. 

Figure 4. 1 : Implementation of a branch in 
Mophological Filter 
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operation with another four inputs. However, there is only one AU in the MaxVideo which can 
be divided into separate AU_Linear and AU_Nonlinear parts. The AU_Linear can be 
configured to perform any arithmetic operation while the AU_Nonlinear can simultaneously 
perform any logical operation. Both the AU_Linear and AU_Nonlinear are capable of handling 
four inputs at a time. Thus, the processing of a morphological filter should be divided into two 
pipes. The first pipe performs a minimum operation with four inputs and the second pipe 
performs a maximum operation with another four inputs (Figure 4.1b). The four inputs of each 
AU should be a 2x2 neighborhood window, i.e. (x,y), (x+l,y), (x,y+l) and (x+l,y+l). 
MaxVideo uses delay elements to handle the correct alignment of data. The AU_DLY element 
can handle only horizontal shifts while the AP_DLY element can handle vertical shifts. By 
setting the crosspoint switchs and adjusting the element attributes, we can arrange the right 
datapath that uses the combination of AU_DLY and AP_DLY to offer the desired alignment 
(Figure 4.1c). 


4.3 Implementation of one branch of a dynamic 
programming 

Now suppose we want to implement one branch of a dynamic programming on the 
MaxVideo system. Generally speaking, the dynamic programming is a recursive averaging 
followed by a maximum operation with four inputs (Figure 4.2a). Again, there is only one AU 
in the MaxVideo so we need to divide the processing into two pipes (Figure 4.2b). In the first 
pipe, the AU_Linear is configured into a 3:1 weighting adder. AU_Linear consists of three 
adders and only one of them is used in this example. For each adder’s input, there is one 
multiplier to handle the weighting of that input. Here, before the addition is performed, the first 
input is multiplied to three times while the second input remains the same. After the addition is 
completed, there is a shifter to normalize the output. Since we want to divide the output by four, 
we configure the shifter so that the output is shifted right for two bits (Figure 4.2c). In the 
second pipe, the AU_Nonlinear is configured to perform a maximum operation with four inputs. 
The configuration is the same as the case in the example of mophological filter (Figure 4. lc). 
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(a) the flowchart of a branch (b) the hardware 
in Dynamic Programming mapping of (a) 



(c) the configuration of AU-Linear performing an 3: 1 
weighted addition operation. This is a part of pipe 1. 


Figure 4.2. Implementation of a branch 
in Dynamic Programming 

4.4 Result of the implementation on MaxVideo 

Several algorithms have been implemented on the old Datacube MaxVideo 200. Table 
4.2 shows the result performance on different sizes of input images. The first number represents 
how many frames can be executed per second, the second number in parameters represents the 
number of sequential pipes required to perform the task. From the table, we can observe that 
there are two factors affecting the frame rate. The first factor is the input image resolution. If 
the resolution of the input image is doubled, then the frame rate is reduced to about one fourth. 
The second factor is the complexity of the algorithms. More sequential pipes are required to 
map a more complex algorithm. If the number of sequential pipes is doubled, then the frame 
rate is reduced to half. 
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Table 4.2: Results of the implementation on the current MaxVideo system. The first number 
represents how many frames can be executed per second, the second number in parentheses 


represents the number of sequential pipes required to perform the task. 


Testing Algorithms 

128x128 

256x256 

512x512 

lKxlK 


fps(pipes) 

fps(pipes) 

fps(pipes) 

fps(pipes) 

Temporal Averaging 

(with acquisition pipe & display pipe) 

• Recursive 

30(5) 

30 (5) 

15(5) 

7.5 (5) 

• Hierarchical 2 frames 

30 (4) 

30 (4) 

30 (4) 

15(4) 

• Hierarchical 4 frames 

30(5) 

30(5) 

15(5) 

7.5(5) 

• Hierarchical 8 frames 

30(11) 

15(11) 

7.5(11) 

3.7(12) 

Morphological Filter (both positive & 
negative target, with acquisition pipe) 

• 2x2 mask 

30(12) 

15(12) 

6(12) 

1.9(12) 

• 3x3 mask 

BSShI 

10(20) 

| 

1.0 (20) 

• 4x4 mask 


10(20) 


1.0(20) 

Morphological Filter 

(both positive & negative target) 

• 2x2 mask 

Mi 

25.7(10) 

7(10) 

U 

• 3x3 mask 

warn 

14.5 (18) 

3.8(18) 

in 

• 4x4 mask 

ni 

14.5 (18) 

3.8(18) 

RHM 

Pyramid Construction&Lowstop Filter (3 
level, both positive & negative target, with 
acquisition pipe) 

15(9) 

15(10) 

10(11) 

2.1 (13) 

Dynamical Programming 

53 (15) 

17(15) 

4.6(15) 

1-3(15) 

Spatial-Temporal Averaging & Opitical Flow' 
(with acquisition pipe & display pipe) 

15 

6 

2 

Out of 

Advanced 

Memory 

Lowstop + Dynamic Programming (both 
positive & negative target, including 128, 256 
and 512 image pipes, with acquisition pipe & 
display pipe) 

1.37 

Morphological + Dynamic Programming 

(both positive & negative target, including 
128, 256 and 512 image pipes, with 
acquisition pipe & display pipe) 

1.01 
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Chapter 5 
Conclusion 


The feasibility of the real-time image capturing, recording and processing system was 
demonstrated by two flight tests conducted by NASA this year. During the first flight test in 
January 1999, image sequences were captured and recorded successfully at a rate of 30 
frames/second. Ten real-time image sequences with translating targets, and six image sequences 
with contracting targets were obtained, containing 90 GB (50 minutes) data total. The tracking 
algorithms were designed and fine-tuned using these image sequences. During the second flight 
test in September 1999, not only the real-time image capturing and recording was performed but 
also the translating target tracking algorithm was executed concurrently at a rate of 15 
frames/second. Output of the algorithm was displayed on an XVS display screen in the cockpit. 
Nine real-time image sequences with translating targets were obtained, containing 20 GB (22 
minutes) data. All image sequences are available upon request for further research on either 
translating or contracting obstacle detection algorithms under different conditions (size, contrast, 
background etc). It was observed that the system successfully detected and tracked translating 
objects during the flight test. 

In some image sequences, it was noticed that a lot of false alarms appeared as the host 
aircraft changing direction. The reason of these false alarms was that some static background 
clutters were mis-identified as moving targets due to the relative movements. Therefore, it 
should be possible to reduce the false alarm rate of our system by considering the movement 
data of the host aircraft. If the movement data of host aircraft can be acquired as an input of our 
system, then the tracking can be adjusted to compensate the movement of host aircraft, thus 
improve the target detection. Besides, though the individual operations for the detection of 
contracting targets were implemented successfully, extended work is still required to arrange, 
connect, and optimize individual operations together into a complete system for the detection of 
contracting targets. 
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Appendix 


A.l Hardware specification of Datacube MaxPCI 

Crosspoint backbone 

• 65x75 connections, each 8-bit. 

• 55x61 additional 1 bit connections. 

• All connections are independent and parallel for construction multiple parallel image 
pipelines. 

• Connections can be switched in real-time. 

• All run on system-wide synchronous 40 MHz pixel clock. 

Arithmetic Unit (AU) 

• Two sets of AU in each MaxPCI 

• 6 (8-bit) inputs from main crosspoint architecture. 

• 4 (8-bit) output to main crosspoint architecture. 

• Internal crosspoint allows internal re-circulation and flexibility: 10x12 for data path 
routing within AU, each 1 0-bit. 

• Linear section: Process 10 bit data with 10-bit or 20-bit output 

• Non-Linear section: 4 to 2 to 1 binary tree arrangement of 10-bit ALUs, or 2 to 1 for 
20-bit data. 

• Statistic section: Sum, min, max, count of incoming pixel values or positions. 

Look-up Table (LUT) 

• Two sets of LUT in each MaxPCI, each LUT has 16-bit input, 16-bit output 

• LUT data can be loaded through either PCI bus or pipeline transfer from crosspoint. 
Histogram Processor (HP) 

• 1024 bin 24-bit increment bin accumulator 

• 8-bit or 10-bit data histograms 

• 24-bit row or column summing 
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• Can also be used as a 10x24 LUT or a 24-bit delay line 

• All bin accumulation data can be output to either PCI bus or pipeline transfer to 
crosspoint. 

Delay Elements (DLY) 

• Two different delay elements. 

• Programmable pipeline delay/buffer elements. 

Processing and Storage Modules (PSMOD) 

• Each PSMOD has twelve 8-bit connections (4 input, 4 output, and 4 bidirectional) to 
the crosspoint. 

• Large family of PSMODs available with continuous expansion. 

• Storage96 PSMOD equips six 16MB VSIMs 

• Convolver200 PSMOD supports arbitrary 200-points convolution kernels 

• General Purpose PSMOD 

• Each module equips four complete AU devices. 

• Two 16x16 LUTs 

• Useful for complicate IP operations like grayscale morphology and median 
filter. 
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A.2 The diagrams for all implemented obstacle 
detection algorithms 


, i i , 

*p *o 
Operation 

R 

— T — 

(A) Magnitude Symbol 

Output=Operation{ Input I *P,Input2*Q) R 


i i 

<xl.yl> (x2,v2i 

Operation 

I 

(B) Shift Symbol 

Output(x,y)=Operation( Input l (x+x 1 ,y+y l )Jnput2(x+x2,y+y2)) 


Figure A1 : Symbol Representation. All the figures in 
this appendix follows these two symbol 
representation. The symbol(A) represents that the first 
input 1 is multiplied by P and the second input is 
multiplied by Q before performing the operation, then 
the output of the operation is divided by R. The 
symbol(B) represents that the first input is shifted by 
(xl,yl) and the second input is shifted by (x2,y2) 
before performing the operation. 
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Figure A2 : Algorithm I. Generally speaking, 
this algorithm performs a lowstop filter (figure 
A2-1) followed by six D.P. (Dynamic 
Programming, figure A2-2) operations. 
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Figure A2-1 : Pyramid Construction&Lowstop Filter 
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Figure A2-2 : Dynamic Programming. 
Generally speaking, the dynamic programming 
consists of four branches. Each branch 
represents one zoom and is constructed by a 
recursive averaging operation followed by two 
maximum operations, each operation has four 
inputs. 


49 


















Figure A3 : Algorithm II. Generally speaking, 
this algorithm performs a pyramid construction 
(figureA3-l) following by three mophological 
filters (figureA3-2, A3-3) and six D.P. (Dynamic 
Programming, figure A2-2) operations. 
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C onvolution Mask: 

1 4 6 4 ! , 

4 16 24 16 4 ' 

6 24 46 24 6 

| 4 16 24 16 4 

j I 4 6 4 1 


Figure A3- 1 : Pyramid Construction. In order to 
detect objects with different sizes, it’s necessary to 
get information from the images with several 
different resolutions. The pyramid construction is 
used to generate four images with different 
resolutions from the original image. 
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Figure A3-2 : Morphological Filter (High Level). 

This figure shows the high level view of a 
morphological filter. Both the dilation and erosion are 
gray-level morphologic operations. The (1x3) or (3x1) 
represents the size of the kernel of each morphologic 
operation. The first output is equal to the original 
image minus the image after opening operation, i.e. 
dilation followed by erosion. The second output is 
equal to the image after closing operation, i.e. erosion 
followed by dilation, minus the original image. 
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Figure A3-3 : Morphological Filter (Low Level). This 
figure shows the low level view of a morphological 
filter. The inputs of each minimum or maximum 
operation should be shifted according to two 
coordinate inside the operation box. 
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Figure A4 : Algorithm III. Figure A4-1 shows the 
spatial temporal smoothing and optical Flow 
computation. Figure A4-2 shows the spatial 
consistency test. Since the temporal consistency test 
is only performed on some feature points, it can be 
done on the host machine instead of on the MaxPCI. 
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Figure A4-1 : Spatial Temporal Smoothing & 
Optical Flow Computation (u,v) = (D u tD,D r /D) 
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Figure A4-2 : Spatial Consistency 
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Figure A5 : Algorithm IV. Figure A5 shows the 
temporal differencing algorithm to detect the 
translating target. Since the tracking is only 
performed on some feature points, it can be done on 
the host machine instead of on the MaxPCI. 
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